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EP 1 504 117 B1 

Description 

[0001] A process for generating sequence-specific proteases by screening-based directed evolution is disclosed. The 
use of the process provides proteases recognizing and cleaving user-definable amino-acid sequences with high se- 
5 quence-specificity. Proteases obtainable by the process can be used in a variety of medical, diagnostic and industrial 
applications. 

Background of the Invention 

10 [0002] Proteolytic enzymes or proteases are a class of enzymes which has an outstanding position among the different 
enzymes, since the reaction catalyzed by proteases is the cleavage of peptide bonds in other proteins. Proteases are 
not only very common enzymes in nature, but belong to the most important enzymes for medical and industrial use. Of 
the total worldwide sales of enzymes, which is estimated to be more than USD 1 billion per year, proteases account for 
approximately 60 %. Based on the functional group present at the active site, proteases are classified into four groups, 

15 i.e., serine proteases (EC 3.4.21 ), cysteine proteases (EC 3.4.22), aspartic proteases (EC 3.4.23), and metalloproteases 
(EC 3.4.24). Classification into one of the four groups is typically done by experimental determination of sensitivity 
towards different types of protease inhibitors. Furthermore, proteases of the four groups differ in their biochemical 
properties. For example, serine proteases are sensitive to inhibitors 3,4-DCI, DFP, PMSF and TLCK, and have a pH 
optimum between pH 7 and 11 . Aspartic proteases are inhibited by pepstatin, DAN and EPNP, and predominantly have 

20 a pH optimum between pH 3 and 4. Cysteine proteases are sensitive to sulfhydryl inhibitors such as PCMB, and besides 
a few exceptions, have neutral pH optima. Metalloproteases are characterized by the requirement of a divalent metal 
ion for their activity. Therefore, metalloproteases are inhibited by chelating agents such as EDTA, and have neutral or 
alkaline pH optima. Among these four groups, further classification is usually done on the basis of structural similarities. 
[0003] Besides such a combined biochemical and structural classification, proteases can be grouped according to 

25 their substrate spectrum. The two most general groups to be distinguished are exoproteases and endoproteases. Exo- 
proteases only cleave peptide bonds at the very end of an, peptide, whereas endoproteases catalyze the cleavage of 
bonds anywhere in a peptide strand. The specificity of proteases, i.e. their ability to recognize and hydrolyze specifically 
certain peptide substrates while others remain uncleaved, can be expressed qualitatively and quantitatively. Qualitative 
specificity refers to the kind of amino acid residues that are accepted by a protease at certain positions of the peptide 

30 substrate. For example, trypsin and the tissue-type plasminogen activator are related with respect to their qualitative 
specificity, since both of them require at the position P1 an arginine or a similar residue (nomenclature of peptide substrate 
positions according to the nomenclature of Schlechter & Berger (Biochem. Biophys. Res. Commun. 27 (1967) 157-162). 
On the other hand, quantitative specificity refers to the relative number of peptide substrates that are accepted as 
substrates. The quantitative specificity can be expressed by the term 

35 

s = - log(Q), 



40 where Q is the ratio of all accepted peptide substrates versus all possible peptide substrates. Quantitative specificities 
of several proteases are shown exemplarily in Table I. The calculation of quantitative specificities is based on the twenty 
naturally occurring amino acids, and on the assumption that all combinations of these twenty amino acids are feasible. 
Consequently, proteases that accept only a small portion of all possible peptides have a high specificity, whereas the 
specificity of proteases that, as an extreme, cleave any peptide substrate would theoretically be zero. 

45 

Table I: Quantitative specificities of different proteases 



protease 


Substrate requirements 


Quantil 
specif 


tative 
icity 


P6 


P5 


P4 


P3 


P2 


P1 


PV 


P2* 


P3' 


Q 


s = -log 
Q 




X 


X 


X 


X 


X 


X 


X 


X 


X 


1.00E+00 


0 


Chymotrypsin 


X 


X 


X 


X 


X 


F/Y/W 


X 


X 


X 


1.50E-01 


0.82 


Papain 


X 


X 


X 


X 


FA//L 


X 


X 


X 


X 


1.50E-01 


0.82 


Trypsin 


X 


X 


X 


X 


X 


K/R 


X 


X 


X 


1.00E-01 


1.00 
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(continued) 



nrotease 


Substrate requirements 


Quantitative 
specificity 


P6 


P5 


P4 


P3 


P2 


P1 


pr 


P2' 


P3' 


Q 


s = -log 
Q 


Pepsin 


X 


X 


X 


X 


X 


F/Y/L 


W/F/Y 


X 


X 


2.25E-02 


1.65 


TEV 


E 


X 


X 


Y 


X 


Q 


S/G 


X 


X 


1 .25E-05 


4.90 


Plasmin 


X 


X 


KA//I/F 


X 


F/Y/W 


R/K 


N 


A 


X 


7.50E-06 


5.12 


thrombin 


X 


X 


UIA//F 


X 


P 


R 


N 


A 


X 


1.25E-06 


5.90 


t-PA 


X 


X 


C 


p 


G 


R 


V 


V 


G 


7.81E-10 


9.11 


(Amino acid residues are abbreviated as shown in Table II. X refers to any amino acid residue.) 



[0004] The quantitative specificity of proteases varies over a wide range. There are very unspecific proteases known, 
such as papain which cleaves all polypeptides that contain a phenylalanine, a valine or an leucine residue (s = 0.82), 
or trypsin which cleaves all polypeptides that contain an arginine or a lysine residue (s = 1.0). On the other hand, there 

20 are highly specific proteases known, such as the tissue-type plasminogen activator (t-PA) which cleaves plasminogen 
only at a single specific sequence (s = 9.11). Proteases with high substrate specificity play an important role in the 
regulation of protein functions in living organisms. The specific cleavage of polypeptide substrates, for example, activates 
precursor proteins or deactivates active proteins or enzymes, thereby regulating their functions. Several proteases with 
high substrate specificities are used in medical applications. Pharmaceutical examples for activation or deactivation by 

25 cleavage of specific polypeptide substrates are the application of t-PA in acute cardiac infarction which activates plas- 
minogen to resolve fibrin clots, or the application of Ancrod in stroke which deactivates fibrinogen, thereby decreasing 
blood viscosity and enhancing its transport capacity. While t-PA is a human protease with an activity necessary in human 
blood regulation, Ancrod is a non-human protease. It was isolated from the viper Agkistrodon rhodostoma, and comprises 
the main ingredient of the snake's poison . Therefore, there exist a few non-human proteases with therapeutic applicability. 

30 Their identification, however, is usually highly incidental. 

[0005] The treatment of diseases by administering drugs is typically based on a molecular mechanism initiated by the 
drug that activates or inactivates a specific protein function in the patient's body, be it an endogenous protein or a protein 
of an infecting microbe or virus. While the action of chemical drugs on these targets is still difficult to understand or to 
predict, protein drugs are able to specifically recognize these target proteins among millions of other proteins. Prominent 

35 examples of proteins that have the intrinsic possibility to recognize other proteins are antibodies, receptors, and proteases. 
Although there are a huge number of potential target proteins, only very few proteases are available today to address 
these target proteins. Due to their proteolytic activity, proteases are particularly suited for the inactivation or activation 
of protein targets. When considering human proteins only, the number of potential target proteins is yet enormous. It is 
estimated that the human genome comprises between 30,000 and 100,000 genes, each of which encodes a different 

40 protein. Many of these proteins are involved in human diseases and are therefore potential pharmaceutical targets. 
Proteases recognizing and cleaving these target proteins with a high specificity are consequently of high value as 
potential drugs. The medical application of such proteases, however, is restricted by their occurrence. For example, 
there are theoretically 25 billion different possibilities for a specificity of s = 1 0.4 (corresponding to the specific recognition 
of a unique sequence of eight amino acid residues). It is highly unlikely to find such a protease with one particular 

45 qualitative specificity by screening natural isolates. 

[0006] Selection systems for proteases of known specificity are known in the art, for instance, from Smith et al., Proc. 
Natl. Acad. Sci. USA, Vol. 88 (1991). As exemplified, the system comprises the yeast transcription factor GAL4 as the 
selectable marker, a defined and cleavable target sequence inserted into GAL4 in conjunction with the TEV protease. 
The cleavage separates the DNA binding domain from the transcription activation domain and therewith renders the 

50 transcription factor inactive. The phenotypical inability of the resulting cells to metabolize galactose can be detected by 
a colorimetric assay or by the selection on the suicide substrate 2-deoxygalactose. 

[0007] Further, selection may be performed by the use of peptide substrates with modifications as, for example, 
fluorogenic moieties based on groups as ACC, previously described by Harris et al. (US 2002/022243). 
[0008] Laboratory techniques to generate proteolytic enzymes with altered sequence specificities are in principle 
55 known. They can be classified by their expression and selection systems. Genetic selection means to produce a protease 
within an organism which protease is able to cleave a precursor protein which in turn results in an alteration of the growth 
behavior of the producing organism. From a population of organisms with different proteases those having an altered 
growth behavior can be selected. This principle was reported by Davis etal. (US 5258289, WO 96/21 009). The production 
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of a phage system is dependent on the cleavage of a phage protein which only can be activated in the presence of a 
proteolytic enzyme or antibody which is able to cleave the phage protein. Selected proteolytic enzymes or antibodies 
would have the ability to cleave an amino acid sequence for activation of phage production. Furthermore, there is no 
control of the specificity of the proteases that are selected. The system does not select for proteases with low activities 

5 for other peptides than the used peptide substrate. Additionally, this system does not allow a precise characterization 
of the kinetic constants of the selected proteases (k cat , K M ). Several other systems with intracellular protease expression 
are reported but they all suffer from the disadvantages mentioned above. Some of them use a genetic reporter system 
which allows a selection by screening instead of a genetic selection, but also cannot overcome the intrinsic insufficiency 
of the intracellular characterization of proteases. 

w [0009] A system to generate proteolytic enzymes with altered sequence specificities with membrane-bound proteases 
is reported. Iverson et al. (WO 98/49286) describe an expression system for a membrane-bound protease which is 
displayed on the surface of cells. An essential element of the experimental design is that the catalytic reaction has to 
be performed at the cell surface, i.e., the substrates and products must remain associated with the bacterium expressing 
the enzyme at the surface. This restriction limits the generation of proteolytic enzymes with altered sequence specificities 

15 and does not allow a precise characterization of the kinetic constants of the selected proteases (k cat , K M ). Furthermore, 
the method does not allow the control of the position at which the peptide is cleaved. Additionally, positively identified 
proteases will have the ability to cleave a certain amino acid (aa) sequence but they also may cleave many other aa 
sequences. Therefore, there is no control of the specificity of the proteases that are selected. 

[0010] A system to generate proteolytic enzymes with altered sequence specificities with self-secreting proteases is 
20 also known. Duff et al. (WO 98/1 1237) describe an expression system for a self-secreting protease. An essential element 
of the experimental design is that the catalytic reaction acts on the protease itself by an autoproteolytic processing of 
the membrane-bound precursor molecule to release the matured protease from the cellular membrane into the extra- 
cellular environment. Therefore, a fusion-protein must be constructed where the target peptide sequence replaces the 
natural cleavage site for autoproteolysis. Limitations of such a system are that positively identified proteases will have 
25 the ability to cleave a certain aa sequence but they also may cleave many other peptide sequences. Therefore, high 
substrate specificity cannot be achieved with such an approach. Additionally, such a system is not able to control that 
selected proteases cleave at a specific position in a defined aa sequence and it does not allow a precise characterization 
of the kinetic constants of the selected proteases (k cat , K M ). 

[0011] Broad et al. (WO 99/1 1801) disclose a heterologous cell system suitable for the alteration of the specificity of 
30 proteases. The system comprises a transcription factor precursor wherein the transcription factor is linked to a membrane 
anchoring domain via a protease cleavage site. The cleavage at the protease cleavage site by a protease releases the 
transcription factor, which in turn initiates the expression of a target gene being under the control of the respective 
promotor. The experimental design of alteration of the specificity consists in the insertion of protease cleavage sites with 
modified sequences and the subjection of the protease to mutagenesis. New proteases obtained may be able to recognize 
35 the modified sequence, the effect of which is monitored by the expression of the target gene. Such a system does also 
not allow a precise control of biochemical properties of the selected proteases. 

[0012] Most of these approaches apply methods of directed evolution for the generation of proteolytic enzymes with 
altered sequence specificities. Several different mutation and recombination methods to generate genetic libraries are 
reported and described elsewhere. All the different methods suffer from their lack of precise selection of positive protease 
40 variants from large libraries. First, these methods are not able to distinguish between single and multi turn-overs of 
peptide substrates which is necessary in order to prevent the selection of low k^ variants. Secondly, it is not possible 
to trigger enzyme and substrate concentration to select protease variants for lower K M . Third, none of these systems 
allows the selection of a protease with an increased activity on the desired peptide substrate whereby the activity on the 

original peptide substrate decreases. 
45 [0013] Methods which fulfill the above mentioned three selection criteria (k cat , K M and substrate specificity) for gen- 
erating proteolytic enzymes with high sequence-specificity applying screening-based directed evolution have heretofore 
not been available. 

Summary of the Invention 

50 

[0014] Thus, the technical problem underlying the present invention is to provide a method for generating new proteases 
with user-defined substrate specificities by applying directed evolution. In particular, the invention is directed to a method 
for the evolution of novel proteases towards selective recognition and cleavage of specific amino-acid sequences only. 
This technical problem has been solved by the embodiments of the invention specified below and in the appended 
55 claims. The present invention is thus directed to 

(1) a method for identifying sequence-specific proteases with target substrate specificities which comprises the 
following steps 
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(a) providing a population of proteases comprised of variants of one first protease or of variants or chimeras of 
two or more first proteases, said first proteases having a substrate specificity for a particular amino acid sequence 
of a first peptide substrate; 

(b) contacting said population of proteases with one or more second substrates, comprising at least one specific 
5 amino acid sequence resembling the amino acid sequence of the target peptide substrate but being not present 

within the first peptide substrate; and 

(c) selecting one or more protease variants from the population of proteases provided in step (a) having specificity 
for said specific amino acid sequence of the second substrates provided in step (b) under conditions that allow 
identification of proteases that recognize and hydrolyse preferably said specific one amino acid sequence within 

w the second substrates, wherein the screening for protease activity is achieved by adding in excess peptides 

other than the second peptide, thereby using the added peptides as competitors; 

(2) in a preferred embodiment of (1) above only one second substrate is used in the one or more cycles (a) to (c), 
i.e., the second substrate is identical with the target substrate; 
15 (3) in a further preferred embodiment of (1 ) above different second substrates are used, and the second substrates 

have an intermediate character with regard to the first substrate and the target substrate, and the last second 
substrate that is used is identical with the target substrate; and 

(4) in a particular preferred embodiment of (1) to (3) above the target protease has a specificity similar to tissue- 
type plasminogen activator and cleaves the target substrate CPGRKa/GG. 



20 



25 



30 



[001 5] The identification and selection of proteases that have evolved towards the target specificity is done by screening 
for catalytic activities on different peptide substrates, either by screening for increased affinity, or by using two substrates 
in comparison, or by using unspecific peptides as competitors, or by using intermediate peptide substrates. 
The following detailed description will disclose the preferred features, advantages and the utility of the present invention. 

Brief Description of the Figures 

[0016] The following figures are provided in order to explain further the present invention in supplement to the detailed 
description: 



Figure 1 depicts schematically the two alternatives A and B of the method of the invention. 

Figure 2 distinguishes the two alternatives A and B of the method of the invention by showing schematically the 

qualitative and quantitative changes in specificity during evolution towards the target specificity. 
Figure 3 illustrates schematically how proteases with changed catalytic activities are evolved using the two alterna- 
35 tives A and B of the method of the invention. 

Figure 4 depicts schematically in two different forms the intermediate approach as one particular aspect of the 

invention that uses intermediate substrates. 
Figure 5 illustrates schematically how, according to the invention, proteases with changed catalytic activities are 

evolved using the intermediate approach. 
40 Figure 6 shows exemplarily an expression vector for S. cerevisiae that can be used for the method of the invention. 
Figure 7 shows exemplarily the hydrolysis of a peptide substrate by the tobacco etch virus protease. 
Figure 8 shows exemplarily a distribution of catalytic activities obtained by screening using confocal fluorescence 

spectroscopy. 

Figure 9 shows exemplarily the decrease in K M during evolution towards higher affinity. 
45 Figure 10 shows exemplarily the change in specificity during evolution of proteases towards the specificity of t-PA. 
Figure 11 depicts schematically a preferred variant of the intermediate approach. 

Figure 12 shows exemplarily the time-dependent substrate conversion of a starting protease in comparison to one 
of the evolved variants. 

50 Detailed Description of the Invention 

[0017] In the framework of this invention the following terms and definitions are used. 

[001 8] The term "protease" means any protein molecule acting in the hydrolysis of peptide bonds. It includes naturally- 
occurring proteolytic enzymes, as well as variants thereof obtained by site-directed or random mutagenesis or any other 
55 protein engineering method, any fragment of an proteolytic enzyme, or any molecular complex or fusion protein comprising 
one of the aforementioned proteins. A "chimera of proteases" means a fusion protein out of two or more fragments 
derived from different parent proteases. 

[0019] The term "substrate" or "peptide substrate" means any peptide, oligopeptide, or protein molecule of any amino 
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acid composition, sequence or length, that contains a peptide bond that can be hydrolyzed catalytically by a protease. 
The peptide bond that is hydrolyzed is referred to as the "cleavage site". Numbering of positions in the substrate is done 
according to the system introduced by Schlechter & Berger (Biochem. Biophys. Res. Commun. 27 (1967) 157-162). 
Amino acid residues adjacent N-terminal to the cleavage site are numbered P1 , P2, P3, etc., whereas residues adjacent 
5 C-terminal to the cleavage site are numbered P1\ P2\ P3', etc. 

[0020] The term "specificity" means the ability of a protease to recognize and hydrolyze selectively certain peptide 
substrates while others remain uncleaved. Specificity can be expressed qualitatively and quantitatively. "Qualitative 
specificity" refers to the kind of amino acid residues that are accepted by a protease at certain positions of the peptide 
substrate. "Quantitative specificity" refers to the number of peptide substrates that are accepted as substrates. Quanti- 
se tative specificity can be expressed by the term s, which is the negative logarithm of the number of all accepted peptide 
substrates divided by the number of all possible peptide substrates. Proteases that accept only a small portion of all 
possible peptide substrates have a "high specificity" (s » 1). Proteases that accept almost any peptide substrate have 
a "low specificity". Proteases with very low specificity (s < 1) are also referred to as "unspecific proteases". 
[0021] The term "first protease" describes any protease used in step (a) of this invention as the starting point in order 
15 to generate populations of protease variants that are related to this first protease. The term "first substrate" or "first 
peptide substrate" describes a substrate that is recognized and hydrolyzed by the first protease. The term "first specificity" 
describes the qualitative and quantitative specificity of the first protease. 

[0022] The term "evolved protease" describes any protease that is generated by use of the method of the invention. 
The term "target substrate"or "target peptide substrate" describes a substrate that is recognized and hydrolyzed by the 

20 evolved protease. The term "target specificity" describes the qualitative and quantitative specificity of the evolved protease 
that is to be generated by use of the method of the invention. Thus, the target specificity defines the specificity of the 
evolved protease for the target peptide substrate while other substrates are not or very weakly recognized and hydrolyzed. 
[0023] The term "intermediate" or "intermediate substrate" describes any substrate that has an intermediate character 
between two other substrate. The intermediate character can base on the amino acid composition, the amino acid 

25 sequence, the properties of the amino acid residues contained in the substrates, or a combination of these characteristics. 
[0024] Catalytic properties of proteases are expressed using the kinetic parameters "K M " or "Michaelis Menten con- 
stant", "k cat " or "catalytic rate constant", and "k cat /K M " or "catalytic efficiency", according to the definitions of Michaelis 
and Menten (Fersht, A., Enzyme Structure and Mechanism, W. H. Freeman and Company, New York, 1995). The term 
"catalytic activity" describes the rate of conversion of the substrate under defined conditions. 

30 [0025] Amino acids are abbreviated according to the following Table II either in one- or in three-letter code. 



Table II: Amino acid abbreviations 



35 



40 



45 



50 



55 



Abbreviations 


Amino acid 


A 


Ala 


Alanin 


C 


Cys 


Cysteine 


D 


Asp 


Aspartic cid 


E 


Glu 


Glutamic acid 


F 


Phe 


Phenylalanine 


G 


Gly 


Glycine 


H 


His 


Histidine 


I 


lie 


Isoteucine 


K 


Lys 


Lysine 


L 


Leu 


Leucine 


M 


Met 


Methionine 


N 


Asn 


Asparagine 


P 


Pro 


Proline 


Q 


Gin 


Glutamine 


R 


Arg 


Arginine 


S 


Ser 


Serine 
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(continued) 



10 



15 



20 



25 



30 



35 



40 



45 



50 



55 



Abbreviations 


Amino acid 


T 


Thr 


Threonine 


V 


Val 


Valine 


W 


Trp 


Tryptophane 


Y 


Tyr 


Tyrosine 



[0026] As set forth above, the present invention is directed to a method for generating sequence-specific proteases 
with a target substrate specificity by applying principles of molecular evolution. According to the invention, this is achieved 
by providing a population of proteases being related to each other, as well as a peptide substrate that resembles the 
target substrate, and selecting one or more protease variants from the population of proteases with respect to their 
specificity for the provided substrate. The selection is done under conditions that allow identification of proteases that 
recognize and hydrolyze the target sequence preferably. 

[0027] In particular, embodiment (1 ) of the invention relates to a method for generating sequence-specific proteases 
with target substrate specificities, wherein the following steps are carried out: 

(a) providing a population of proteases, wherein each variant is related to one or more first proteases, these first 
proteases having a first substrate specificity; 

(b) providing one or more peptide substrates comprising at least one amino-acid sequence that resembles the target 
peptide substrate; 

(c) selecting one or more protease variants from the population of proteases provided in step (a) with respect to 
their specificity for the substrate provided in step (b) under conditions that allow identification of proteases that 
recognize and cleave the target sequence preferably; 

and wherein steps (a) to (c) are carried out cyclically until one or more protease variants with the target substrate 
specificity are identified. 

[0028] When repeating steps (a) to (c), the one or more proteases selected in step (c) of one cycle are used as the 
one or more first proteases in step (a) of the next cycle. 

[0029] In one alternative of the invention, the one or more first proteases serving as starting points in step (a) of the 
method have a high sequence specificity which is maintained high during the directed evolution towards the target 
specificity. 

[0030] In another alternative of the method, the one or more first proteases serving as starting points in step (a) of the 
method have a low sequence specificity, which is increased during the directed evolution towards the target specificity. 
[0031] The steps (a) to (c) of the above method are carried out for at least one cycle. Preferably, however, these steps 
are carried out for several cycles, with each one or more protease variants selected in one cycle being the origin of the 
population of protease variants in the next cycle. Preferably, more than one and less than hundred, more preferably 
more than two and less than fifty, particularly preferably more than three and less than twenty, especially preferably 
more than four and less than ten, and most preferably five cycles of steps (a) to (c) are carried out until one or more 
protease variants with the target substrate specificity are identified. 

[0032] The invention applies evolutionary means as described in very detail in W0921 8645 with that document being 
incorporated in its entirety for all purposes. 

[0033] For an overview on the application of evolutionary principles to molecular biotechnology, which is usually 
referred to as "directed evolution" or "evolutionary biotechnology", see the review by Koltermann & Kettling (Biophys. 
Chem. 66 (1997) 159-177). 

[0034] Part of the invention is the provision of populations of protease variants wherein each variant is related to one 
or more first proteases. In principle, there can be a large number of these first proteases, all together being the origin 
for the first cycle of the method. It is preferred, however, that these first proteases comprise fifty or less different proteases, 
more preferably ten or less different proteases, especially preferably two or less different proteases. Most preferably, 
only one first protease is employed. 

[0035] According to the invention, any protease can be used as first protease. Preferably, an endoprotease is used 
as first protease. It is preferred that the protease belongs to the group of proteases consisting of Serine proteases (EC 
3.4.21), Cysteine proteases (EC 3.4.22), Aspartic proteases (EC 3.4.23), and Metalloproteases (EC 3.4.24). First pro- 
teases are characterized by their ability to recognize and hydrolyze peptide substrates with a certain qualitative and 
quantitative specificity. First proteases can have a specificity in the same range as the specificity of the protease that is 
to be generated. Examples for proteases with relatively high specificities are TEV protease, HIV-1 protease, BAR1 
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protease, Factor Xa, Thrombin, tissue-type plasminogen activator, Kex2 protease, TVMV-protease, RSV protease, MuLV 
protease, MPMV protease, MMTV protease, BLV protease, EIAV protease, SIVmac protease. Alternatively, the first 
proteases have a lower specificity than the specificity of the protease that is to be generated. As an extreme example 
of the latter, proteases with very low sequence specificity are employed, for example proteases such as Papain, Trypsin, 
5 Chymotrypsin, Subtilisin, SET (trypsin-like serine protease from Streptomyces erythraeus), Elastase, Cathepsin G or 
Chymase. 

[0036] A particularly suitable protease is sp |P12630 | BAR1 protease ( BAR 1_Y EAST Barrierpepsin precursor (EC 

3.4.23.35) (extracellular "barrier protein) (BAR proteinase) of S. cerevisiae (see SEQ ID NOB). 

[0037] The provision of populations of proteases is essentially done as described in W092 18645. According to the 

10 invention, genes encoding protease variants are ligated into a suitable expression vector by standard molecular cloning 
techniques (Sambrook, J.F; Fritsch, E.F.; Maniatis,T.; Cold Spring Harbor Laboratory Press, Second Edition, 1989, New 
York). The vector is introduced in a suitable expression host cell, which expresses the corresponding protease variant. 
Particularly suitable expression hosts are bacterial expression hosts such as Escherichia coli or Bacillus subtilis, or yeast 
expression hosts such as Saccharomyces cerevisae or Pichia pastoris, or mammalian expression hosts such as Chinese 

15 Hamster Ovary (CHO) or Baby Hamster Kidney (BHK) cell lines, or viral expression systems such as the Baculovirus 
system. Alternatively, systems for in vitro protein expression can be used. 

[0038] In a preferred embodiment of the invention, the genes are ligated into the expression vector behind a suitable 
signal sequence that leads to secretion of the protease variants into the extracellular space, thereby allowing direct 
detection of protease activity in the cell supernatant. Particularly suitable signal sequences for Escherichia coli are HlyA, 

20 for Bacillus subtilis AprE, NprB, Mpr, AmyA, AmyE, Blac, SacB, and for S. cerevisiae Bar1, Suc2, Mata, InulA, Ggplp. 
[0039] In another preferred embodiment of the invention, the protease variants are expressed intracellular^ and the 
peptide substrates are expressed also intracellular^. Preferably, this is done essentially as described in WO 0212543, 
using a fusion peptide substrate comprising two auto-fluorescent proteins linked by the substrate amino-acid sequence. 
[0040] In another preferred embodiment of the invention, the protease variants are expressed intracellular^, or secreted 

25 into the periplasmatic space using signal sequences such as DsbA, PhoA, PelB, OmpA, OmpT or gill for Escherichia 
coli, followed by permeabilisation or lysis step to release the protease variants into the supernatant. The destruction of 
the membrane barrier can be forced by the use of mechanical means such as ultrasonic, French press, or the use of 
membrane-digesting enzymes such as lysozyme. 

[0041] As a further alternative, the genes encoding the protease variants are expressed cell-free by the use of a 
30 suitable cell-free expression system. In a particularly preferred embodiment, the S30 extract from Escherichia coli cells 
is used for this purpose as described by Lesly et al. (Methods in Molecular Biology 37 (1995) 265-278). 
[0042] The relatedness to the one or more first proteases can be achieved by several procedures. For example, the 
genes encoding the one or more first proteases are modified by methods for random nucleic acid mutagenesis. In a 
preferred embodiment of the invention, random mutagenesis is achieved by the use of a polymerase as described in 
35 WO 9218645. According to this embodiment, the one or more genes encoding the one or more first proteases are 
amplified by the use of a polymerase with a high error rate, or under conditions that increase the rate of misincorporations, 
thereby leading to a population of_genes wherein each gene encodes a protease that is related to the one or more first 
proteases. For example the method according to Cadwell, R.C and Joyce, G.F. can be employed (PCR Methods Appl. 
2 (1 992) 28-33). Other methods for random mutagenesis that can be employed make use of mutator strains, UV-radiation 
40 or chemical mutagens. Most preferably, errors are introduced into the gene at or near but below the error threshold as 
described in WO 9218645 . 

[0043] In another preferred embodiment of the invention, certain parts of the gene encoding the protease variants are 
randomized completely with respect to the amino-acid sequence, and are re-introduced into the gene as an oligonucle- 
otide cassette. This technique is usually referred to as cassette mutagenesis (Oliphant, A.R. et al., Gene 44 (1986) 

45 177-183; Horwitz, M.S., etal. Genome 31 (1989) 112-117). In a particularly preferred embodiment of the invention, the 
part of the gene that encodes amino acid residues that are essential for recognition of the substrate is randomized via 
cassette mutagenesis. These residues can be identified from structural studies. In particular, residues comprising parts 
of the substrate binding pocket are targeted by cassette mutagenesis. Alternatively, substituting each amino acid residue 
with an alanine, and analyzing whether there is an effect on the catalytic activity can identify such residues. As a further 

so alternative, these residues can be identified by first introducing random mutations into the gene, screening for an effect 
on specificity, affinity, or catalytic activity, and determining afterwards the position of mutations in variants that represent 
altered specificity, affinity or altered catalytic activity. As an extreme of this approach, the completely randomized se- 
quence can has the length of one nucleotide only. This approach is typically referred to as site saturation mutagenesis. 
[0044] In another preferred embodiment of the invention, nucleic acid sequences are randomly introduced into or 

55 deleted from the one or more first protease genes in order to provide a population of proteases. This approach is referred 
to as insertion and/or deletion mutagenesis. For insertion mutagenesis, random sequences of defined or random length 
are introduced randomly into a gene. As an example, the method described by Hallet et al. (Nucleic Acids Res. 1997, 
vol. 25, p. 1866ff) can be used to introduce a random 15 nt sequence randomly into a gene. Alternatively, defined 
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sequences.-for example a sequence encoding a specific protein secondary structure motif, can be inserted randomly 
into a gene. Alternatively, random sequences of defined or random length can be inserted at specific sites into a gene. 
This can be done using restriction sites or by oligonucleotide overlap extension methods such as the method described 
by Horton (Gene 1989, vol. 77, p. 61ff). For deletion mutagenesis, sequences of defined or random length are deleted 

5 randomly from a gene. In a particular embodiment of the invention, deletion and insertion mutagenesis are combined 
so that insertions at one site can potentially be combined, and thereby possibly compensated, by deletion at another site. 
[0045] In a further preferred embodiment of the invention, methods for homologous in-vitro recombination are used 
for the provision of protease populations. Examples of methods that can be applied are the Recombination Chain Reaction 
(RCR) according to WO 0134835, the DNA-Shuffling method according to WO 9522625, the Staggered Extension 

10 method according to WO 9842728, or the Random Priming recombination according to W09842728. Furthermore, also 
methods for non-homologous recombination such as the Itchy method can be applied (Ostermeier, M. et al., Nature 
Biotechnology 17 (1999) 1205-1209). All of the references mentioned above are hereby incorporated by reference in 
its entirety for all purposes. 

[0046] In further embodiments of the invention, the above-mentioned methods are combined with each other. In a 
15 particularly preferred embodiment, the Recombination Chain Reaction is combined with random mutagenesis such as 
error-prone PCR according to Cadwell, R.C and Joyce, G.F. (PCR Methods Appl. 2 (1992) 28-33) in order to de-couple 
mutations selected in the round before and to introduce simultaneously a defined number of new random mutations into 
the population. 

[0047] The coupling of protease genotype and phenotype is achieved by use of sample carriers that enable compart- 
20 mentation of samples, and the distribution of genotypes into sample carriers is done at a multiplicity per compartment 
that allows sufficient differentiation of phenotypes. 

[0048] The one or more first proteases that serve-as the starting point of the method either have a specificity which 
is in the range of the target specificity that is to be generated by the method, or have a lower specificity than the target 
specificity. Accordingly, the method of the invention is either performed under conditions that maintain the specificity 
25 quantitatively and alters it qualitatively (Alternative A), or the method of the invention is performed under conditions that 
maintains the specificity qualitatively and increases it quantitatively (Alternative B). Moreover, both approaches can be 
combined. These three principle alternatives are shown schematically in Figure 2. 

[0049] In a preferred embodiment of the invention corresponding to alternative A, the one or more first proteases have 
a first specificity that is quantitatively in the range of the target specificity, but qualitatively distinct from the target specificity. 

30 Proteases having the target substrate specificity are achieved using the method of the invention by selecting protease 
variants under conditions that allow identification of proteases that recognize and cleave the target sequence preferably. 
[0050] In another preferred embodiment of the invention corresponding to alternative B, the specificity of the one or 
more first proteases is quantitatively lower when compared to the target specificity. This means that they accept and 
hydrolyze a larger number of peptide substrates. This low first specificity is subsequently 

35 [0051] increased by the method of the invention until it is in the range of the target specificity. As a preferred variant 
of this embodiment, the first specificity is qualitatively related to the target specificity. Thus, the large number of peptide 
substrates that is accepted and hydrolyzed includes the target substrate already. Accordingly, amino acid residues that 
are essential in the first substrate remain essential residues in the target substrate. Then, proteases having the target 
substrate specificity are achieved using the method of the invention by selecting protease variants under conditions that 

40 allow identification of proteases that recognize and cleave the target sequence preferably. 

[0052] Another part of the invention is the provision of peptide substrates that resemble the target substrate, and the 
use of these substrates for screening of protease variants with respect to their catalytic activity. 

[0053] In a preferred embodiment of the invention, suitable peptide substrates are synthesized via the solid phase 
peptide synthesis approach of Merrifield etal (Nature. 207 (1965) 522-523).These peptide substrates are then incubated 

45 for a certain time in a sample buffer containing the protease variant to be tested. The hydrolysis of the peptide is then 
analyzed by a suitable method. For example, the amount of fragmented peptides can be analyzed by chromatography. 
In particular, peptide fragments are analyzed advantageously on a reversed phase HPLC system. Alternatively, the 
peptide substrate is modified in any way to enable the analysis of peptide hydrolysis. In particular, the peptide substrate 
may carry functional groups that enable the detection of the hydrolysis of the substrate. Such functional groups include, 

50 but are not limited to, the following: 

one or more fluorophores or chromophores, whose spectroscopic properties change upon hydrolysis of the peptide, 
whereby screening is performed through determination of the change in spectroscopic properties; or 
two fluorophores which are distinguishable by their fluorescence properties and which are attached to opposite ends 
55 of the second substrate, whereby the screening is performed through confocal fluorescence spectroscopy at fluor- 

ophore concentrations below 1 p-M; or 

two fluorophores which form a fluorescence resonance energy transfer (FRET) pair and which are attached to 
opposite ends of the second substrate, whereby screening is performed through determination of the decrease in 
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the energy transfer between the two fluorophores; or 

a first and second autofluorescent protein flanking the second substrate, whereby the screening is performed through 
confocal fluorescence spectroscopy at substrate concentrations below 1 |xM; or 

a fluorophore and a quencher molecule which are attached to opposite ends of the second substrate, whereby 
5 screening is performed through determination of the decrease in quenching of the fluorophore; or 

a fluorophor or a chromophor and a binding moiety which are attached to opposite ends of the second substrate, 
whereby screening is performed through determination of binding of the binding moiety to a specific binding partner ; 
or 

a radioactive label and a binding moiety which are attached to opposite ends of the second substrate, whereby 
10 screening is performed through use of a scintillation proximity assay; or 

any combination thereof. 

[0054] With respect to the above mentioned functional groups, a chemical group can be attached to the peptide that 
alters its properties when the peptide is hydrolyzed. For example, a para-nitrophenyl group can be used for this purpose. 

15 As another example, one or more fluorophores and/or a quencher molecule are attached to the peptide, and the amount 
of fragmented peptide is analysed by measuring a difference in the fluorescence of the fluorophors. For example, two 
fluorophores that are suited to form a FRET (fluorescence resonance energy transfer) pair are attached to the peptide 
at opposite ends, and the hydrolysis of the peptide is measured by a decrease in the energy transfer between the two 
fluorophors. For example, Rhodamine Green (Molecular Probes Inc., Oregon, USA) and Tetramethylrhodamine (Mo- 

20 lecular Probes Inc., Oregon, USA) can be used as fluorophors that are suited to form such a FRET pair. 

[0055] In a particularly preferred embodiment of the invention, two fluorophores that do not form a substantial FRET 
pair are attached to opposite ends of synthetic peptide substrates. As an example, Rhodamine Green (Molecular Probes 
Inc., Oregon, USA) and Cy-5 (Amersham Biosciences Europe GmbH, Freiburg) can be used for this purpose, and 
covalent attachment of the dye can be achieved via a succinimidyl ester linkage to a primary amino group of the peptide. 

25 Hydrolysis of these peptides is preferably analysed by means of confocal fluorescence spectroscopy according to patent 
applications WO 941 631 3 and W0961 3744, which are hereby incorporated by reference in their entirety for all purposes. 
Due to the high sensitivity of confocal fluorescence spectroscopy, substrates are used in concentrations below one 
micromolar, more preferably below hundred nanomolar, and most preferably below ten nanomolar. Therefore, screening 
according to this embodiment is done substantially below the K M of typical proteases. 

30 [0056] In another particularly preferred embodiment of the invention, fusion proteins comprising a first autofluorescent 
protein, a peptide, and a second autofluorescent protein are used as peptide substrates. According to WO0212543, 
which is hereby incorporated by reference in its entirety for all purposes, autofluorescent include the Green Fluorescent 
Protein GFP and its mutants, as well as dsRED and its mutants. Fusion proteins can be produced by expression of a 
suitable fusion gene in E. coli, lysis of cells and purification of the fusion protein by standard methods such as ion 

35 exchange chromatography or affinity chromatography. 

[0057] It is an essential part of the invention that proteases with the target substrate specificity are generated by 
selecting protease variants under conditions that allow identification of proteases that recognize and cleave the target 
sequence preferably. This selection can be achieved according to the different aspects of the invention as outlined below. 
[0058] In a first aspect of the invention, proteases that recognize and cleave the target sequence preferably are 

40 identified by screening for proteases with a high affinity for the target substrate sequence. High affinity corresponds to 
a low K M which is selected by screening at target substrate concentrations substantially below the K M of the first protease. 
This aspect is referred to as the "affinity approach". 

[0059] In a preferred embodiment of this aspect of the invention, the peptide substrate provided in step (b) is linked 
to one or more fluorophores that enable the detection of the hydrolysis of the peptide substrate at concentrations below 
45 10m. m . preferably below 1 jiM, more preferably below 100 nM, and most preferably below 10 nM. 

[0060] In a second aspect of the invention, proteases that recognize and cleave the target sequence only are identified 
by providing two or more peptide substrates in step (b) and by screening for activity on these two or more peptide 
substrates in comparison. This aspect is referred to as the "comparison approach". 

[0061] In a preferred embodiment of this aspect of the invention, the two or more peptide substrates provided in step 
so (b) are linked to different marker molecules, thereby enabling the detection of the cleavage of the two or more peptide 
substrates consecutively or in parallel. In a particularly preferred embodiment of the invention, two peptide substrates 
are provided in step (b), one peptide substrate having an amino acid sequence identical to or resembling the first peptide 
substrate thereby enabling to monitor the original activity of the first proteases, and the other peptide substrate having 
an amino-acid sequence identical to or resembling the target substrate sequence thereby enabling to monitor the activity 
55 on the target substrate. In an especially preferred embodiment of the invention, these two peptide substrates are linked 
to fluorescent marker molecules, and the fluorescent properties of the two peptide substrates are sufficiently different 
in order to distinguish both activities when measured consecutively or in parallel. For example, a fusion protein comprising 
a first autofluorescent protein, a peptide, and a second autofluorescent protein according to patent application WO 
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0212543 can be used for this purpose. Alternatively, fluorophores such as rhodamines are linked chemically to the 
peptide substrates. 

[0062] In a third aspect of the invention, proteases that recognize and cleave the target sequence preferably are 
identified by providing in step (b) one or more peptide substrates resembling the target peptide together with competing 

5 peptide substrates in high excess. Screening with respect to activity on the substrates resembling the target substrate 
is then done in the presence of the competing substrates. Proteases having a specificity which corresponds qualitatively 
to the target specificity, but having only a low quantitative specificity are identified as negative samples in such a screen. 
Whereas proteases having a specificity which corresponds qualitatively and quantitatively to the target specificity are 
identified positively. This aspect is referred to as the "competitor approach". 

10 [0063] In a preferred embodiment of this aspect of the invention, the one or more peptide substrates resembling the 
target substrate are linked to marker molecules, thereby enabling the detection of their hydrolysis, whereas the competing 
peptide substrates do not carry marker molecules. The competing peptide substrates have an amino-acid sequence 
identical to or resembling the first peptide substrate, or have random amino-acid sequences, thereby acting as competitive 
inhibitors for the hydrolysis of the marker-carrying peptide substrates. 

15 [0064] In a fourth aspect of the invention, proteases that recognize and cleave the target sequence preferably are 
identified by using intermediate substrates for evolving the protease towards the target substrate specificity. This aspect 
is hereinafter also referred to as the "intermediate approach". In a first variant of this aspect of the invention, this is 
achieved by providing in different cycles different peptide substrates, whereby each peptide substrate has an intermediate 
character with regard to the cycle before and the target peptide substrate. According to this variant, proteases are evolved 

20 gradually toward the target specificity. Figure 4 depicts schematically the basic principle of this variant of the intermediate 
approach. 

[0065] More generally, a first variant of this aspect of the invention is directed to a method for generating sequence- 
specific proteases with a target substrate specificity, wherein the following steps are carried out: 

25 (a) providing a population of proteases, wherein each variant is related to one or more first proteases, these first 

proteases having specificity for a spectrum of peptide substrates or a single peptide substrate; 

(b) providing one or more peptide substrates that has an intermediate character with regard to the first peptide 
substrate and the target substrate; 

(c) selecting one or more protease variants from the population of proteases provided in step (a) with respect to 
30 their specificity for the substrate provided in step (b); . 

(d) repeating steps (a) to (c) until one or more protease variants with activity for the intermediate substrate provided 

in step (b) are identified; 

(e) replacing the first peptide substrate in steps (a) and (b) with the intermediate substrate, and the first proteases 
in step (a) with the protease variants selected in step (c); 

35 

and repeating steps (a) to (e) until one or more protease variants with the target substrate specificity are identified. 
[0066] In this first variant of this aspect of the invention, evolution of protease specificity is directed via consecutive 
selection on a certain number of intermediate peptide substrates, whereby every peptide substrate resembles more and 
more the target peptide sequence. This approach is based on the finding that proteases which accept related substrates 
40 are usually also related to each other. Relatedness of proteases in the context of this invention is a measure for the 
homology in the amino acid sequences of two or more enzymes. Moreover, this approach is based on the surprising 
discovery, that distinguishable subsites in a protease active site can be evolved separately, and that their molecular 
structure can be attributed to different residues of a peptide substrate (Schlechter & Berger, Biochem. Biophys. Res. 
Commun. 27 (1967) 157-162). 

45 [0067] Intermediate substrates can be realized by substituting amino acid residues at one or more positions from the 
first peptide sequence with amino acid residues at the same positions from the target peptide sequence. Such interme- 
diates are referred to as "amino acid composition intermediates". Additionally, an intermediate peptide substrate can 
include one or more amino acid residues at one ore more positions which are neither the residues of the first peptide 
sequence nor the residues of the target peptide sequence at that position, but are amino acid residues with an intermediate 

50 character with respect to the residues in the first and the target substrate. Such intermediates are referred to as "amino 
acid property intermediates". The intermediate character of this kind of intermediates can be based on one or more 
physical and chemical parameters, which include, but are not limited to, the surface of the residue, its volume, the 
isoelectric point, the side chain pKa, the polarity, the ability to form hydrogen bonds or the hydrophobicity. In the following 
table, the twenty naturally occurring amino acid residues are classified according to these parameters. 

55 
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Table III: Classification of the 20 naturally occurring amino acid residues 
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a Chothia, C, J. Mol. Biol., 105 (1975) 1-14; b Zamyatin, A.A., Prog. Biophys. Mol. Biol., 24 (1972) 107-123 ; c Tanford, 
C., Adv. Prot. Chem., 17 (1962) 69-165 ; d charge at physiological pH; e Black, S.D, Mould, D.R, Anal. Biochem., 1 93 
(1991)72-82. 



[0068] If, for example, the first substrate were ALY and the target substrate were NRF, intermediate substrates with 

40 regard to the amino acid composition would be, for example, AL.F, NRY, or A RF (Modifications with regard to the first 
substrate are indicated). An intermediate substrate with regard to amino acid properties would be, for example, AQF, 
with the glutamine residue resembling the original leucine in the sense that it is uncharged, but resembling more the 
arginine residue of the target substrate with respect to hydrophobicity and its capacity to form hydrogen bonds. A further 
example for this approach would be SLY where S resembles A with respect to volume and surface but is more similar 

45 to the target N in terms of hydrophobicity and hydrogen bonding. 

[0069] Furthermore, the number of consecutive peptide substrates to be used depends on the relatedness of the first 
peptide sequence and the target peptide sequence as well as the quantitative specificity f the one or more first proteases. 
The more unrelated the first peptide sequence and the target peptide sequence are, and the higher the specificity of the 
one or more first proteases is, the more consecutive intermediate peptide substrates are required. 

50 [0070] In a second variant of this aspect of the invention, different proteases that have specificity for different inter- 
mediates are selected in parallel in a first step of the method. In a second step, proteases which have the target specificity 
are then selected from a population containing randomly recombined chimeras of the proteases selected in the first step. 
Preferably, the recombination of different proteases selected in parallel is achieved by employing an in-vitro homologous 
recombination technique, such as the Recombination Chain Reaction described in patent application WO 0134835. Both 

55 forms of intermediates can be used for this variant. However, amino acid composition intermediates are preferably 
employed. Figure 1 1 shows schematically the basic principle of this variant of the fourth aspect of the invention. 
[0071] The different intermediates employed in the first step of this variant are preferably chosen in a way, that the 
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sum of all modifications introduced into these intermediates equals or resembles the characteristics of the target substrate. 
As an example, when the first substrate were ALY and the target substrate were NRF, suitable amino acid composition 
intermediates for this embodiment would be NLY, ARY, and ALE . Proteases having specificity for these three substrates 
would then be randomly recombined and screened for specificity towards the target substrate of this example NRF. 

5 Other examples, including intermediates who more than one modification are to be constructed analogously. 

[0072] In further aspects of the invention, two, three or all four of the different aspects mentioned above are combined 
with each other. In a preferred combination, screening for proteases with decreased Michaelis-Menten constants is 
combined with the use of intermediate substrates. In another preferred combination, screening for proteases with de- 
creased Michaelis-Menten constants is combined with the screening of two or more substrates consecutive or in parallel. 

10 in a further preferred combination, screening for proteases with decreased Michaelis-Menten constants is combined 
with using an excess of competing, unlabelled substrate. In a particularly preferred combination, the four aspects, 
screening for proteases with decreased Michaelis-Menten constants, screening of two or more substrates in parallel, 
the use of an excess of competing, unlabelled substrate, and the use of intermediate substrates, are combined with 
each other. 

15 [0073] In a particularly preferred embodiment of the invention the target protease has a specificity similar to tissue- 
type plasminogen activator and cleaves the target substrate CPGR^WGG. Such target protease can, among others, be 
generated by the above defined method of the invention when the starting protease is BAR1 protease from S. cerevisiae. 
In such method the following second/intermediate substrates are preferably utilized: 

20 (i) WLGLVPGG 

(ii) WLGQVPGG 

(iii) WLGRVPGG 

(iv) WLGRWGG 

(v) CPGRWGG. 

25 

[0074] The sequence-specific protease obtainable by the methods described hereinbefore preferably has a specificity 
similar to tissue-type plasminogen activator and cleaves the target substrate CPGRiWGG. The starting protease 
preferably is BAR1 protease including, but not limited to, the one depicted in SEQ ID NO:8. Additionally, BAR1 proteases 
modified by truncation up to 200 aa, preferably in the range of 100 to 200 aa, more preferably in the range of 120 to 180 
30 aa, most preferably in the range of 140 to 160 aa at the C- or N-terminal can be used as starting proteases. Even more 
preferably the sequence-specific protease is derived from said BAR1 derived protease and has at least one mutation 
selected from the group comprising the modifications L33I, Y45D, T47A, T59I, N82D, E96V, M107I, N123D, E143D, 
N151V,l152F,K161E,A163T,T165A,R178S,T221l,E231V,D321N,D367G,M369L,V370l l A399S,K404RandS440L. 

Particularly preferred among said proteases are those having at least one of D367G, V370I, M107I, I152F, E143D and 
35 E231V. The particularly preferred mutants of BAR1 protease may further be modified e.g. by truncation of up to 10 aa 
at the C- or N-terminal ends thereof or by deletion, insertion or substitution of up to 50 aa, preferably up to 20 aa, most 
preferably up to 10 aa within its sequence. 

Detailed Description of the Figures 

40 

[0075] Figure 1 depicts schematically the two alternatives A and B of the method of the invention. Starting with a first 
protease, the aim of the invention is the generation of an evolved protease with a high specificity for a target peptide 
substrate which is characterized by its amino acid sequence. 

For the purpose of this figure, different shapes represent different amino acid residues, and the inverse profile of the 
45 shapes represent the protease's recognition sites, respectively. Shapes with a swung tilde at the top represent any 
amino acid residue at that position. The active site of the enzyme is indicated by an asterisk, and the arrow indicates 
the cleavage site within the substrate. 

The type of-the one or more proteases used as first proteases defines whether alternative A or alternative B is to be 
employed. In alternative A, the first protease is characterized by an already high specificity towards a defined, first 
50 substrate. According to the method of the invention, this specificity has to be changed qualitatively into the target 
specificity. In alternative B, the first protease has a relatively low specificity, i.e. it does not discriminate between a pool 
of substrates that differs, for example, at positions P2,P1\ and/or P2'.Bythe process of the invention, only the quantitative 
specificity of those proteases is increased towards the value of the target specificity. 

[0076] Figure 2 distinguishes the two alternatives A and B of the method of the invention by showing schematically 
55 the qualitative and quantitative changes in specificity during evolution towards the target specificity. The quantitative 
specificity s, as defined in the framework of this invention, refers to the ratio between all accepted and all possible 
substrates. The qualitative specificity refers to the amino acid composition and sequence of accepted substrates. Spe- 
cificities of the first proteases (open circles) and the evolved protease (filled circle) are indicated schematically. In 
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alternative A, the first protease has a quantitative specificity in the range of the target specificity, but a qualitative specificity 
that differs from the target specificity. In order to generate the target specificity, the specificity is changed qualitatively 
only. In alternative B, the first protease has the qualitative specificity of the target substrate, but a quantitative specificity 
that is far below the target specificity. In order to generate the target specificity, the specificity is changed quantitatively 

5 only. When both alternatives are combined, the first protease has neither the qualitative nor the quantitative specificity 
of the target substrate. In order to generate the target specificity, the specificity is changed quantitatively and qualitatively. 
[0077] Figure 3 illustrates schematically how proteases with changed catalytic activities are evolved using the two 
alternatives A and B of the method of the invention. According to the method of the invention, the catalytic activity 
(abbreviated as A) can be used as a selection parameter. In alternative A, the first protease hydrolyses only substrate 

10 1, whereas other substrates including the target substrate (T) are not or only very slowly hydrolyzed. By use of the 
method of the invention, proteases are evolved that hydrolyze specifically the target substrate, whereas other substrates 
including the first substrate are not or only very slowly hydrolyzed. These evolved proteases are selected by an increase 
of catalytic activity on the target substrate and a decrease of catalytic activity on the first substrate (comparison approach). 
Alternatively, selection can be based on the affinity towards the target substrate (affinity approach). In alternative B, the 

15 first protease hydrolyzes all substrates including the target substrate (T). By use of the method of the invention, proteases 
are evolved that hydrolyze specifically the target substrate, whereas other substrates including the first substrate are 
not or only very slowly hydrolyzed. 

These proteases are selected by screening with an excess of competing substrates (competitor approach) or by screening 
for higher substrate affinity (affinity approach). In general, the evolved protease can be identified by the comparison of 

20 the catalytic activity towards offered substrates including the first substrates and the target substrate. 

[0078] Figure 4 depicts schematically in two different forms the intermediate approach as one particular aspect of the 
invention. For description of symbols, refer to figure 1. The intermediate approach uses one or more intermediate 
substrates to guide the evolution of specificity gradually towards the target specificity in steps as small as necessary. 
Intermediate substrates are substrates that have an intermediate character when compared with the first and the target 

25 substrate. Intermediates can be classified into two forms. First, intermediate substrates can be provided by replacing at 
least one but less than all amino acid residues of the first substrate with amino acid residues from the target substrate 
(Intermediate with respect to amino acid composition, Approach 1). Secondly, intermediate substrates can be provided 
by selectively introducing at defined positions of the substrate amino acid residues whose properties range between 
those of the corresponding amino acid residue in the first and the target substrate (Intermediate with respect to amino 

30 acid properties, Approach 2). As a further alternative, both intermediate approaches can be combined. Preferably, as 
shown in the figure, the second approach is implemented into the first approach whenever the step between two inter- 
mediates is too large. 

[0079] Figure 5 illustrates schematically how, according to the invention, proteases with changed catalytic activities 
are evolved using the intermediate approach. The first protease has a high activity on a first substrate (1 ) and no or very 

35 low activity on all other substrates including the target substrate (T). The following essential step is the provision of an 
intermediate substrate (2) as illustrated in Figure 4 . By screening for catalytic activity on this substrate, protease variants 
with an increased activity on this intermediate substrate are selected. This intermediate step can be repeated with a 
gradual variation of the intermediate substrate towards the nature of the target substrate, until an evolved protease is 
isolated which shows catalytic activity to the target substrate only and no or very low activity on the first substrate and 

40 other substrates. 

[0080] Figure 6 shows schematically the shuttle vector pPDE that can be used for the method of the invention. The 
vector comprises a S. cerevisiae origin (2|a ori), an E. coli origin (pMB1 ori), a S. cerevisiae marker (URA3), an E. coli 
marker (AmpR), and the expression cassette which is composed of a galactose-inducible S. cerevisiae promotor (GAL), 
a signal sequence for secretion of the expressed protein (signal), a Kpnl and an Xhol recognition site for inserting the 

45 gene of interest, and a terminator (Cyc1 ). 

[0081] Figure 7 shows exemplarily the hydrolysis of a peptide substrate catalyzed by the tobacco etch virus protease 
monitored by cross-correlation confocal fluorescence spectroscopy (cc-FCS). The peptide substrate with the sequence 
ENLYFQS is specifically recognized and hydrolyzed by the TEV protease. 100 nM double-labeled peptide (Alexa 488, 
Cy5) were incubated with (filled squares) and without (open circles) addition of 0.01 U/fxl protease in assay buffer 

50 containing 50 mM Tris-HCI pH 8.0, 0.5 mM EDTA, 10 mM DTT, 0.05% glycerol. 

[0082] Figure 8 shows exemplarily a distribution of catalytic activities obtained by screening a population of protease 
variants on the substrate WLGLVPGG (intermediate 1, see Example VI) using confocal fluorescence spectroscopy. 
Shown is the frequency N with which a certain catalytic activity (performance, arbitrary units) is identified. Low values 
represent low catalytic activities, whereas high values represent high catalytic activities on the substrate. Genes encoding 

55 variants having highest performance values are isolated and evaluated with respect to their specificity. These variants 
are then used as first proteases for the next cycle. This procedure is repeated until there are protease variants identified 
that have the target specificity. 

[0083] Figure 9 shows exemplarily the decrease in K M during evolution toward higher affinity using the affinity approach 
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of the invention. The protease used as first protease (wild type) in this experiment was subtilisin E from B. subtilis which 
had a K M of 194 jiM. This K M was gradually decreased by use of the method of the invention by a factor of 7.5 down to 
26 p.M. 

[0084] Figure 10 shows exemplarily the change in specificity during evolution of proteases towards the specificity of 
5 t-PA. The activity of variants 1, 2, and 3 were evaluated using the substrates intermediate 1, intermediate 2, and inter- 
mediate 3 of example VI. The decrease in the substrate concentration corresponds to proteolytic activity. The faster this 
decrease is, the higher is the catalytic activity of the protease variant. While the first protease has very low activity on 
intermediate 1, and no activity on intermediates 2 or 3, the evolved variants show various activities on the three inter- 
mediate substrates. 

10 [0085] Figure 1 1 depicts schematically a preferred variant of the intermediate approach of the invention (fourth aspect, 
see below), where proteases are in a first step selected according to their specificity for different intermediate substrates 
in parallel. Protease are then selected according their specificity for the target substrate from a population containing 
recombined variants of the protease variants selected in the first step. 

[0086] Figure 12 shows exemplarily kinetic progression curves for the first protease in comparison with an evolved 
15 protease obtained in round 5 of the optimisation method according to the invention. In case of the first substrate the 
activity of the evolved protease is lower compared to the first protease. This is inverted in case of the 1 st and 4 th 
intermediate, where the first protease shows very limited and no turnover of the substrate, respectively. 
[0087] The invention is further explained by the following Examples. It is understood that the examples and embodi- 
ments described therein are for illustrative purposes only and that various modifications or changes in light thereof will 
20 be suggested to persons skilled in the art and are to included within the spirit and purview of this application and are 
considered within the scope of the appended claims. All publications, patents, and patent applications cited herein are 
hereby incorporated by reference in their entirety for all purposes. 

Examples 

25 

[0088] In the following examples, materials and methods of the present invention are provided including the determi- 
nation of catalytic properties of enzymes obtained by the method. It should be understood that these examples are for 
illustrative purpose only and are not to be construed as limiting this invention in any manner. 

[0089] In the experimental examples described below, standard techniques of recombination DNA technology were 
30 used that were described in various publications, e.g. Sambrook et al. (1989), Molecular Cloning: A Laboratory Manual, 
Cold Spring Harbor Laboratory, or Ausubel et al. (1987), Current Protocols in Molecular Biology 1987-1988, Wiley 
Interscience, Methods in Yeast Genetics (1994) A Cold Spring Harbour Laboratory Manual, which are incorporated 
herein in their entirety by reference. Unless otherwise indicated, restriction enzymes, polymerases and other enzymes 
as well as DNA purification kits were used according to the manufacturers specifications. 

35 

Example I: Molecular cloning of genes encoding protease variants 

[0090] Genes encoding protease variants were cloned into a vector suitable for extracellular expression of proteins 
by the yeast Saccharomyces cerevisiae. The vector used is a derivate of the plasmid pYES2, which is commercially 

40 available from Invitrogen, Inc. A map of the plasmid is shown in Figure 6. The vector contains a 2\l origin for amplification 
in S. cerevisiae, a pMB1 origin for amplification in E. coli, a URA marker for selection in S. cerevisiae, a ampicillin 
resistance marker for selection in E. coli, as well as a GAL promoter and a Cyc1 transcription terminator for inducible 
expression in S. cerevisiae. A 90 bp fragment that contains the leader sequence encoding the signal peptide from the 
BAR1 gene of S. cerevisiae was introduced behind the GAL1 promoter. Restriction sites Kpnl and Xhol served as 

45 insertion sites for heterelogous genes to be expressed: Cloning of genes encoding protease variants was done as 
follows: the coding sequence of the mature protein was amplified by PCR using primers that introduced a Kpnl site at 
the 5' end and a Xhol site at the 3' end. This PCR fragment was cloned into the appropriate sites of the vector and 
identity was confirmed by sequencing. 

50 Example II: Providing populations of protease variants 

[0091] A population of protease variants was provided by random modification of genes encoding proteases with 
known substrate specificities, followed by expression of the protease variants encoded by these modified genes using 
S. cerevisiae as a suitable host organism. First, genes encoding protease variants with known substrate specificities 
55 were PCR amplified under error-prone conditions, essentially as decribed by Cadwell, R.C and Joyce, G.F. (PCR Methods 
Appl. 2 (1992) 28-33). Error-prone PCR was done using 30 pmol of each primer, 20 nmol dGTP and dATP, 100 nmol 
dCTP and dTTP, 20 fmol template, and 5 U Taq DNA polymerase in 10 mM Tris HCI pH 7.6, 50 mM KCI, 7 mM MgCI2, 
0.5 mM MnCI2, 0.01 % gelatin for 20 cycles of 1 min at 94°C, 1 min at 65°C and 1 min at 72°C. The resulting DNA library 
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was purified using the Qiaquick PCR Purification Kit following the suppliers' instructions. PCR products were digested 
with restriction enzymes Xho\ and Kpnl and purified as described in Example I. Afterwards, the PCR products were 
ligated into the vector which was digested with Xhol and Kpnl, gel-purified and dephosphorylated. The ligation products 
were transformed into E. coli, amplified in LB containing ampicillin as marker, and the plasmids were purified using the 
5 Qiagen Plasmid Purification Kit following the suppliers' instructions. Resulting plasmids were transformed into S. cere- 
visiae cells. Populations of protease variants were provided by inducing expression in the transformed S. cerevisiae 
cells by adding 2% galactose to the medium. 

Alternatively, genes encoding protease variants with known substrate specificities were statistically recombined at ho- 
mologous positions by use of the Recombination Chain Reaction, essentially as described in WO 01 34835. PCR products 

10 of the genes encoding the protease variants were purified using the QIAquick PCR Purification Kit following the suppliers' 
instructions, checked for correct size by agarose gel electrophoresis and mixed together in equimolar amounts. 80 p,g 
of this PCR mix in 150 mM TrisHCL pH 7.6, 6.6 mM MgCI 2 were heated for 5 min at 94 °C and subsequently cooled 
down to 37 °C at 0.05 °C/sec in order to re-anneal strands and thereby produce heteroduplices in a stochastic manner. 
Then, 2.5 U Exonuclease III per \xg DNA were added and incubated for 20, 40 or 60 min at 37 °C in order to digest 

15 different lengths from both 3' ends of the heteroduplices. The partly digested PCR products were refilled with 0.6 U Pfu 
polymerase per jj.g DNA by incubating for 15 min at 72°C in 0.17 mM dNTPs and Pfu polymerase buffer according to 
the suppliers' instructions. After performing a single PCR cycle, the resulting DNA was purified using the QIAquick PCR 
Purification Kit following the suppliers' instructions, digested with Kpnl and Xhol and ligated into the linearized vector. 
The ligation products were transformed into E. coli, amplified in LB containing ampicillin as marker, and the plasmids 

20 were purified using the Qiagen Plasmid Purification Kit following the suppliers' instructions. Resulting plasmids were 
transformed into S. cerevisiae cells. Populations of protease variants were provided by inducing expression in the 
transformed S. cerevisiae cells by adding 2 % galactose to the medium. 

Example III: Providing peptide substrates that resemble the target substrate 

25 

[0092] Ail peptide substrates were synthesized on a peptide synthesizer using the approach of Merrifield et al. (Nature. 
207 (1965) 522-523). Peptide substrates that resemble the target substrate were designed by substituting the amino 
acid residues at one or more positions of the first peptide substrate with the amino acid residues at the one or more 
positions of the target substrate. Alternatively, the amino acid residues at one or more positions of the first peptide 

30 substrate were substituted with amino acid residues that have an intermediate character with respect to the amino acid 
residues of the first peptide substrate and the amino acid residues of the target peptide substrate. For the determination 
of the intermediate character of amino acid residues refer to Table III. Marker fluorophores were attached to the peptide 
substrates either via the amino group of the N-terminus or via the carboxy group of the C-terminus. Alternatively, a 
cysteine residue was added either at the N-terminus or at the C-terminus of the peptide, and the marker fluorophor was 

35 chemically attached to the thiol group of the cysteine residue. 

Alexa 488 (Molecular Probes Inc., Oregon, USA) and Cy-5 (Amersham Biosciences Europe GmbH, Freiburg, Germany) 
were typically used as fluorophor markers. Protease cleavage of the peptide substrate was monitored by cross-correlation 
FCS (Proc.Natl.Acad.Sci.USA. 95 (1998) 141 6-1420). As an example, the cleavage of a peptide substrate that contains 
the target substrate for tobacco etch virus protease (TEV protease) and has the Alexa 488 fluorophor attached to the 

40 C-terminus of the peptide and the Cy-5 fluorophor attached to the N-terminus of the peptide is shown in Figure 7. The 
TEV protease has already a relatively high specificity (s = 4.9, see Table I). Cleavage was done at a peptide concentration 
of 100 nM by adding 0.01 U/»xl TEV protease in assay buffer containing 50 mM Tris-HCI pH 8.0, 0.5 mM EDTA, 10 mM 
DTT, and 0.05% glycerol. 

45 Example IV: Screening procedure 

[0093] In order to identify enzyme variants having the desired substrate specificity, a screening approach based on 
a confocal fluorescence spectroscopy set-up as disclosed in WO 9416313 was used. Either the cell suspension of a S. 
cerevisiae culture directly, or an aliquot of the cell-free supernatant was used as the sample containing the secreted 

so protease variant. After adding the substrate to the sample and incubation for a certain period of time, the samples were 
subjected to measurement by confocal fluorescence spectroscopy. If necessary, this procedure was repeated several 
times in order to measure kinetics of the proteolytic cleavage. Consequently, the samples were ranked according to 
proteolytic activity, and samples exceeding a certain activity threshold were identified in order to isolate the gene encoding 
the corresponding protease variant. The distribution of proteolytic activities of protease variants obtained by this procedure 

55 is shown in Figure 8. 
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Example V: Generating sequence-specific proteases with increased affinity towards the target peptide substrate by 
screening at low substrate concentrations 

[0094] Protease variants that have an increased affinity towards the target peptide substrate were generated by the 
5 method of the invention based on screening at low substrate concentrations. By means of error-prone PCR (according 
to Cadwell, R.C and Joyce, G.F., PCR Methods Appl. 2 (1992) 28-33), a population of protease variants was generated 
that is related to the alkaline protease subtilisin E from Bacillus subtilis, which has a relatively low specificity (s = 0.82). 
This correlates to the relatively high K M which is in the range of 150 - 200 jjlM. The population of protease variants was 
screened at a complexity of 10 6 variants by confocal fluorescence spectroscopy employing substrate concentrations in 
10 the range of 10 nM. Variants isolated in this first screen were used as first proteases in a second cycle to provide another 
population of protease variants. Analogously, variants isolated in subsequent cycles were used as first proteases in the 
following cycle. The population of variants provided in the second cycle and all subsequent cycles was generated by a 
combination of error-prone PCR (see above) and in-vitro homologous recombination (according to WO 0134835). Var- 
iants isolated from the first four cycles of this procedure were analyzed kinetically. The increase in affinity towards the 
15 substrate over the four rounds corresponds to the decrease in K M of the best performers of each cycle which is shown 
in Figure 9. 

Example VI: Generating sequence-specific proteases with a target substrate specificity resembling the specificity of 
tissue-type plasminogen activator 

20 

[0095] Proteases were generated by the method of the invention that had a specificity that was altered towards the 
specificity of tissue-type plasminogen activator (t-PA). The BAR1 protease from Saccharomyces cerevisiae (SEQ ID 
NO:8) was used as first protease. This protease belongs to the group of aspartic proteinases (Mac Kay et al.; Structure 
an Function of the Aspartic Proteinases (1991) 161-172). It is specific for peptide substrates containing the amino acid 

25 sequence WLQLKPGQ, and catalyses the cleavage at the peptide bond between the second leucine and the lysine 
residue. Populations of protease variants that were related to the BAR 1 protease or proteases isolated in subsequent 
screening cycles were generated by means of error-prone PCR (according to Cadwell, R.C and Joyce, G.F., PCR 
Methods Appl. 2 (1992) 28-33) and in-vitro homologous recombination using the Recombination Chain Reaction (WO 
01 34835). Protease variants were screened for proteolytic activity at complexities of 1 0 6 variants by confocal fluorescence 

30 spectroscopy. The BAR 1 protease used as first protease already had a relatively high specificity which was in the range 
of the target specificity. Therefore, a combination of the affinity approach and of the intermediate approach was used. 
Screening at low concentrations kept the specificity of the protease high, while screening on intermediate substrates 
enabled the evolution towards the new specificity. Four intermediate substrates were constructed. Intermediate substrate 
1 had the amino acid sequence WLGLVPGG, intermediate substrate 2 the amino acid sequence WLGQVPGG, inter- 

35 mediate substrate 3 the amino acid sequence WLGRVPGG, and intermediate four had the sequence WLGRWGG. The 
target substrate specificity of t-PA is directed to CPGRWGG with cleavage between the arginine residue and the first 
valine residue. All substrates are shown in Table IV. 

40 
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Table IV 
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[0096] Intermediate 1 was an amino acid composition intermediate due to the fact that it contained at positions P4, 
P3, P1 and P2' the same amino acid residues as the first substrate, and at positions P2, P1\ and P4' the same residues 

25 as the target substrate. Intermediate 2 was an amino acid property intermediate with regard to intermediate 1 and the 
target substrate. It resembled intermediate 1 but contained at position P1 a glutamine residue which has an intermediate 
character compared to the leucine residue present at that position in the first substrate and the arginine residue present 
at that position in the target substrate. Intermediate 3 as another amino acid composition intermediate was based on 
amino acid residues stemming from both, the first substrate and the target substrate, as intermediate substrate 1 does, 

30 but, in contrast to the latter one, shared one additional position with the target substrate. Compared to intermediate 3, 
intermediate 4 shares one further amino acid with the target sequence at position P 2 \ The changed specificities of 
different variants that were generated by this method are shown in Figure 10. 

[0097] Increase of substrate specificity can also be measured as time-dependent conversion of the substrates, as 
exemplarily demonstrated in Figure 1 2. The substrate conversion is presented as the fraction of non-converted substrate 
35 over time. As in Figure 12, the first protease and an evolved variant of round 5 differ in their proteolytic activity on the 
first substrate, intermediate 1 and intermediate 4, respectively. In case of the first substrate the activity of the evolved 
protease is lower compared to the first protease. This is inverted in case of the 1 st and 4 th intermediate, where the first 
protease shows very limited and noturnoverof the substrate, respectively, while the evolved protease shows considerable 
activity on both substrates. 

40 [0098] In this way proteases are generated according to the method of the invention, that have a substrate specificity 
similar to the human tissue-type plasminogen activator. The proteases generated have at least one mutation at a position 
out of the group: 33, 45, 47, 59, 82, 96, 107, 123, 143, 151, 152, 161, 163, 165, 178, 221, 231, 321, 367, 369, 370, 399, 
404, 440 (based on the numbering of the amino acid sequence of the protease BAR1 listed as SEQ I D NO:8). Preferably, 
a protease variant evolved from the BAR1 wt protease towards specificity of the human tissue-type plasminogen activator 

45 has at least one mutation out of the group: D367G, M369L, V370I, M107I, I152F, E143D, E231V, L33I, Y45D, T47A, 
T59I, N82D, E96V, N123D, N151V, K161E, A163T, T165A, R178S, T221I, D321N, A399S, K404R and S440L. 
[0099] Figure 12 presents the catalytic behaviour of a protease evolved according to the method of the invention in 
comparison to the starting (first) protease. Starting with BAR1 protease (SEQ ID NO:8) variants are obtained with different 
mutations. Fig. 12 shows plots reflecting the increase of substrate specificity of a variant of round 5. Investigations done 

50 on the amino acid sequence of the exemplified variant of round 5 revealed a particular combination of amino acid 
substitutions (with numbering equivalent to the numbering of Bar1 protease) as Y45D, T47A, N82D, M107I, E143D, 
I152F, T165A, E231V, D367G, V370I. 
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[0100] 



<110> DIREVO Biotech AG 
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<120> Process for Generating Sequence-Specific Proteases by Directed Evolution and Use Thereof 

<130> 031139wo/JH/ml 

5 <140> 
<141> 

<160> 8 

10 <170> Patent In Ver. 2.1 

<210> 1 
<211> 8 
<212>PRT 
15 <21 3> Artificial Sequence 

<220> 

<223> Description of Artificial Sequence: Protease substrate 
20 <400> 1 

Trp Leu Oln Leu Lys Pro Gly Gin 
l 5 



<210> 2 
<211>8 
<212> PRT 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: Protease substrate 
<400> 2 

Trp Leu Gly Leu Val Pro Gly Gly 

1 5 



'40 <210>3 
<211> 8 
<212> PRT 

<213> Artificial Sequence 

45 <220> 

<223> Description of Artificial Sequence: Protease substrate 

<400> 3 

Trp Leu Gly Gin Val Pro Gly Gly 
1 S 



25 



30 



35 



55 <210>4 

<211> 8 
<212> PRT 

<213> Artificial Sequence 
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<220> 

<223> Description of Artificial Sequence: Protease substrate 
<400> 4 

rrp Leu Gly Arg Val Pro Gly Gly 
1 5 



<210>5 
<211>8 
<212> PRT 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: Protease substrate 
<400> 5 

Trp Leu Gly Arg Val Val Gly Gly 
1 5 



<210>6 
<211>8 
c212> PRT 

c213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence: Protease substrate 
<400> 6 

rye Pro Gly Arg Val Val Gly Gly 
1 5 



<210> 7 
<211> 7 
<212> PRT 

<213> Artificial Sequence 
<220> 

<223> Description of Artificial Sequence : TEV protease substrate 
<400> 7 

31u Asn Leu Tyr Phe Gin Ser 
1 5 



<210> 8 
<211> 587 
<212> PRT 

<213> Saccharomyces cerevisiae 
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<400> 8 



4et Ser Ala lie Asn His Leu Cys Leu Lye Leu lie Leu Ala Ser Phe 
15 10 IS 

Via tie lie Asn Thr lie Thr Ala Leu Thr Asn Asp Gly Thr Gly His 

20 25 30 



10 



15 
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25 



30 



35 



40 



45 



50 



55 



21 



EP 1 504 117 B1 



Leu Glu Phe Leu Leu Gin His Glu Glu Glu Met Tyr Tyr Ala Thr Thr 
35 40 45 

Leu Afip He Gly Thr Pro Ser Gin Ser Leu Thr Val Leu Phe Asp Thr 
50 55 €0 

Gly Ser Ala Asp Phe Trp Val Met Asp Ser Ser Asn Pro Phe Cys Leu 

70 75 80 

Ser Tyr Ser Asn Ala Thr Tyr Asn Gly Glu 

90 95 

Asp Cys Arg Ser Met Ser Thr Tyr Asn Glu 
105 110 

Gin Tyr Leu Glu Asn Gly Arg Phe Tyr He 
120 



Pro Asn Ser Asn Thr 

85 

Glu Val Lys Pro Ser 

100 



His Arg Ser Ser Thr Tyr 



Thr Tyr Ala Asp Gly Thr Phe Ala Asp Gly Ser Trp Gly Thr Glu Thr 
130 135 140 



Val Ser He Asn Gly He Asp He Pro Asn He Gin Phe Gly Val Ala 
145 150 155 160 



Lys Tyr Ala Thr Thr Pro Val Ser Gly Val Leu Gly He Gly Phe Pro 

165 1^0 175 

Arg Glu Ser Val Lye Gly Tyr Glu Gly Ala Pro Asn Glu Tyr Tyr 
180 185 190 

Asn Phe Pro Gin He Leu Lys Ser Glu Lys He He Asp Val Val 

200 205 

Pro Asp Ser Gly Thr Gly Ser He 

220 

Lys Phe Ser Gly Asp Leu Phe Thr 
235 240 

Thr He Val Asp Ala Pro Ala Thr 
250 255 

Leu Ala Met Thr He Gin Gly Leu Gly Ala Gin Asn Lys Ser Ser Cys 

260 265 270 



Ala Tyr Ser Leu Phe Leu Asn 
210 215 

Val Phe Gly Ala He Asp Glu 

230 



Phe Pro Met Val Asn Glu Tyr Pro 



Glu His Glu Thr Phe Thr Thr Thr Lys Tyr Pro Val Leu Leu Asp Ser 

280 



Gly Thr Ser Leu Leu Asn Ala Pro Lys Val He Ala Asp Lys Met Ala 
290 295 300 



Ser Phe Val Asn Ala Ser Tyr Ser Glu Glu Glu Gly He Tyr He Leu 
305 310 315 320 



Asp Cy» Val Ser Val G1 V As P Val Glu 1 V r Asn Phe Asp Phe Gly 

325 330 335 

Asp Leu Gin He Ser Val Pro Leu Ser Ser Leu He Leu Ser Pro Glu 

340 345 350 

Thr Glu Gly Ser Tyr Cys Gly Phe Ala Val Gin Pro Thr Asn Asp Ser 
355 360 365 
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10 



15 



20 



25 



30 



35 



40 



4et Val Leu Gly Asp Val Phe Leu Ser Ser Ala Tyr Val Val Phe Asp 
370 375 380 

Ueu Asp Asn Tyr Lys lie Ser Leu Ala Gin Ala Asn Trp Asn Ala Ser 
585 390 395 400 

31u val Ser Lys Lys Leu Val Asn He Gin Thr Asp Gly Ser He Ser 

405 410 415 

a 

31y Ala Lys He Ala Thr Ala Glu Pro Trp Ser Thr Asn Glu Pro Phe 

420 425 430 

rhr Val Thr Ser Asp lie Tyr Ser Ser Thr Gly Cys Lys Ser Arg Pro 
435 440 



Phe Leu Gin Ser Ser Thr Ala Ser Ser Leu He Ala Glu Thr Asn Val 
450 455 460 

31n Ser Arg Asn Cys Ser Thr Lys Met Pro Gly Thr Arg Ser Thr Thr 

465 470 475 480 

Val Leu Ser Lys Pro Thr Gin Asn Ser Ala Met His Gin Ser Thr Gly 

485 490 495 

Ala Val Thr Gin Thr Ser Asn Glu Thr Lys Leu Glu Leu Ser Ser Thr 

500 505 510 

Met Ala Asn Ser Gly Ser Val Ser Leu Pro Thr Ser Ash Ser He Asp 



Lys Glu Phe Glu His Ser Lys Ser Gin Thr Thr Ser Asp Pro Ser Val 

530 535 540 

Ala Glu His Ser Thr Phe Asn Gin Thr Phe val His Glu Thr Lys Tyr 

550 555 560 



Arg Pro Thr His Lys Thr Val He Thr Glu Thr Val Thr Lys Tyr Ser 

565 570 

Thr Val Leu He Asn val Cys I»ys Thr Tyr 

580 585 



45 

Claims 

1 . A method for identifying sequence-specific proteases with target substrate specificities which comprises the following 
so steps 

(a) providing a population of proteases comprised of variants of one first protease or of variants or chimeras of 
two or more first proteases, said first proteases having a substrate specificity for a particular amino acid sequence 
of a first peptide substrate; 

55 (b) contacting said population of proteases with one or more second substrates, comprising at least one specific 

amino acid sequence being' identical to the target peptide substrate or having an intermediate character with 
regard to the first substrate and the target substrate but being not present within the first peptide substrate; and 
(c) selecting one or more protease variants from the population of proteases provided in step (a) having specificity 
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for said specific amino acid sequence of the second substrates provided in step (b) under conditions that allow 
identification of proteases that recognize and hydrolyse preferably said specific one amino acid sequence within 
the second substrates, wherein the screening for protease activity is achieved by adding in excess peptides 
other than the second peptide, thereby using the added peptides as competitors. 

5 

2. The method of claim 1 , wherein the selection conditions in step (c) further comprise 

(i) screening for protease activity under low substrate concentrations, thereby increasing affinity for the second 
substrate, and/or 

10 (ii) screening for protease activity by using two or more substrates in comparison, thereby increasing the se- 

lectivity of the enzyme. 

3. The method of claim 1 or 2, wherein steps (a) to (c) are repeated cyclically until one or more protease variants with 
specificity for the second substrate are identified, and wherein protease variants selected in one cycle are used as 

15 first proteases in the following cycle, and wherein at least one cycle and less than 100 cycles are performed. 

4. The method according to any one of claims 1 to 3, wherein only one second substrate is used in the one or more 
cycles, and wherein the second substrate is identical with the target substrate. 

20 5. The method according to any one of claims 1 to 3, wherein different second substrates are used, and wherein the 
second substrates have an intermediate character with regard to the first substrate and the target substrate, and 
wherein the second substrate that is used in the last cycle is identical with the target substrate. 

6. The method of claim 5, wherein different second substrates are used in consecutive cycles, and wherein each 
25 second substrate has intermediate character with regard to the second substrate used before and the target substrate. 

7. The method of claim 5 or 6, whereby in at least one cycle steps (b) to (c) are executed with different second substrates 
in parallel, and wherein the protease variants isolated in such a parallel way are combined and used as first proteases 
in the next cycle. 

30 

8. The method according to any of claim*5 to 7, wherein the intermediate character of the intermediate substrates is 
based on 

(i) the amino acid composition, 

35 (jj) the amino acid sequence, 

(iii) the physical and/or the chemical properties of the amino acid residues within the specific amino acid se- 
quence, whereby preferably one or more properties from the group consisting of the following amino acid 
properties is used: the surface, the volume, the isoelectric point, the side chain pKa, the charge, the polarity, 
the hydrophobicity, or 

40 (iv) any combination thereof. 

9. The method according to any one of claims 1 to 5, wherein the second substrates differ from the first substrates in 
that 1 to 5 amino acid residues within the specific amino acid sequence are exchanged. 

45 10. The method according to any one of claims 1 to 9, wherein the second substrates carry functional groups that enable 
the detection of the hydrolysis of the substrate, said functional groups being 

(i) one or more fluorophores or chromophores, whose spectroscopic properties change upon hydrolysis of the 
peptide, whereby screening is performed through determination of the change in spectroscopic properties; or 
50 (ii) two fluorophores which are distinguishable by theirfluorescence properties and which are attached to opposite 

ends of the second substrate, whereby the screening is performed through confocal fluorescence spectroscopy 
at fluorophore concentrations below 1 y.M ; or 

(iii) two fluorophores which form a fluorescence resonance energy transfer (FRET) pair and which are attached 
to opposite ends of the second substrate, whereby screening is performed through determination of the decrease 

55 jn the energy transfer between the two fluorophores ; or 

(iv) a first and second autofluorescent protein flanking the second substrate, whereby the screening is performed 
through confocal fluorescence spectroscopy at substrate concentrations below 1 \M\ or 

(v) a fluorophore and a quencher molecule which are attached to opposite ends of the second substrate, whereby 
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screening is performed through determination of the decrease in quenching of the fluorophore; or 

(vi) a fluorophor or a chromophor and a binding moiety which are attached to opposite ends of the second 
substrate, whereby screening is performed through determination of binding of the binding moiety to a specific 
binding partner ; or 

(vii) a radioactive label and a binding moiety which are attached to opposite ends of the second substrate, 
whereby screening is performed through use of a scintillation proximity assay; or 

(viii) any combination thereof. 

11. The method according to any one of claims 1 to 10, wherein 

(i) the population of proteases is obtained through random nucleic acid mutagenesis, cassette mutagenesis, 
site-saturation mutagenesis, site-specific or random insertion and/or deletion mutagenesis, homologous in vitro 
recombination, homologous in-vivo recombination, non-homologous recombination, or a combination thereof; 
and/or 

(ii) the expression of the population of proteases is done by use of host cells, preferably from bacterial, yeast, 
insect, viral or mammalian origin, or is done by use of cell-free protein expression systems, and/or 

(iii) the coupling of protease genotype and phenotype is achieved by use of sample carriers that enable com- 
partmentation of samples, and the distribution of genotypes into sample carriers is done at a multiplicity per 
compartment that allows sufficient differentiation of phenotypes. 

12. A method according to any of claims 1 to 11, wherein the first protease is selected from the group of proteases 
consisting of serine proteases, cysteine proteases, aspartic proteases and metalloproteases, and wherein the first 
protease is preferably selected from the group of proteases consisting of Papain, Bromelain, Trypsin, Pepsin, Chy- 
motrypsin, Subtilisin, SET, Human elastase, Cathepsin, Chymase, Sacharomycopsis fibuligera PEP I, Kallikrein, 
Urokinase, Thermolysin, Collagenase, Pseudomonas aeruginosa elastase, TEV protease, HIV-1 protease, BAR1 
protease, Factor Xa, Thrombin, Tissue-type plasminogen activator, Kex2 protease, TVMV-protease, RSV protease, 
MuLV protease, MPMV protease, MMTV protease, BLV protease, EIAV protease, SIVmac protease. 

13. The method according to any one of claims 1 to 12, preferably according to any one of claims 5 to 12, wherein the 
target protease has a specificity similar to tissue-type plasminogen activator and cleaves the target substrate CP- 
GRiVVGG. 

14. The method of claim 13, wherein the starting protease is BAR1 protease from S. cerevisiae and preferably the 
following second/intermediate substrates are utilized: 

(i) WLGLVPGG 

(ii) WLGQVPGG 

(iii) WLGRVPGG 

(iv) WLGRWGG 

(v) CPGRWGG. 



Patentanspruche 

1. Verfahren zur Identifizierung sequenzspezifischer Proteasen mitZielsubstrat-Spezifitaten, umfassend die folgenden 
Schritte 

(a) Bereitstellung einer Population von Proteasen, welche Varianten einer ersten Protease Oder Varianten oder 
Chimaren von zwei oder mehr ersten Proteasen beinhaltet, wobei diese erste Protease eine Substratspezifitat 
fur eine bestimmte Aminosauresequenz eines ersten Peptidsubstrats aufweist; 

(b) Kontaktierung besagter Population von Proteasen mit einem oder mehreren zweiten Substraten, umfassend 
mindestens eine spezifische Aminosauresequenz, welche identisch zum Ziel-Peptidsubstrat ist, oder welche 
einen intermediaren Charakter in Bezug auf das erste Substrat und das Zielsubstrat hat, aber nicht im ersten 
Peptidsubstrat vorhanden ist; und 

(c) Selektieren von einer oder mehreren Proteasevarianten aus der Population von Proteasen, bereitgestellt in 
Schritt (a), mit einer Spezifitat fur besagte spezifische Aminosaure Sequenz des zweiten Substrats, bereitgestellt 
in Schritt (b) unter Bedingungen, welche die Identifizierung von Proteasen zulassen, welche bevorzugt besagte 
spezifische Aminosauresequenz innerhalb des zweiten Substrats erkennen und hydrolysieren, wobei das Scree- 
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ning nach Protease-Aktivitat durch Zugabe eines Uberschusses von anderen Peptiden a!s dem zweiten Peptid 
erreicht wird, wobei die zugegebenen Peptide als Kompetitoren dienen. 

Verfahren nach Anspruch 1, wobei die Selektionsbedingungen in Schritt (c) zudem umfassen 

(i) Screenen nach Protease-ktivitat bei geringen Substratkonzentrationen, wodurch die Affinitat fur das zweite 
Substrat erhoht wird, und/oder 

(ii) Screenen nach Protease Aktivitat unter Verwendung von zwei oder mehr Substraten im Vergleich, wodurch 
die Setektivitat des Enzyms erhoht wird. 

. Verfahren nach Anspruch 1 oder 2, wobei die Schritte (a) bis (c) zyklisch wiederholt werden, bis eine oder mehr 
Proteasevarianten mit Spezifitat fur das zweite Substrat identifiziert sind, und wobei die in einem Zyklus selektierten 
Proteasevarianten als erste Proteasen im folgenden Zyklus eingesetzt werden, und wobei mindestens ein Zyklus 
und weniger als 100 Zyklen ausgefuhrt werden. 

Verfahren nach einem der Anspruche 1 bis 3, wobei nur ein zweites Substrat in dem einen oder mehr Zyklen 
eingesetzt wird, und wobei das zweite Substrat mit dem Zielsubstrat identisch ist. 

. Verfahren nach einem der Anspruche 1 bis 3, wobei unterschiedliche zweite Substrate eingesetzt werden, und 
wobei die zweiten Substrate einen intermediaren Charakter in Bezug auf das erste Substrat und das Zielsubstrat 
haben und wobei das zweite Substrat, das im letzten Zyklus eingesetzt wird, identisch mit dem Zielsubstrat ist. 

. Verfahren nach Anspruch 5, wobei unterschiedliche zweite Substrate in aufeinanderfolgenden Zyklen eingesetzt 
werden, und wobei jedes zweite Substrat einen intermediaren Charakter in Bezug auf das zuvor eingesetzte zweite 
Substrat und das Zielsubstrat hat. 

. Verfahren nach Anspruch 5 oder 6, wobei in mindestens einem Zyklus Schritte (b) bis (c) mit unterschiedlichen 
ZW ejten Substraten parallel ausgefuhrt werden und wobei die in solch einer parallelen Weise isolierten Protease 
Varianten kombiniert und als erste Proteasen fur den nachsten Zyklus verwendet werden. 

Verfahren nach einem der Anspruche 5 bis 7, wobei der intermediate Charakter des intermediaren Substrates 
basiert auf 

(i) der Aminosaurezusammensetzung, 

(ii) der Aminosauresequenz, 

(iii) den physikalischen und/oder chemischen Eigenschaften der Aminosaurereste innerhalb der spezifischen 
Aminosauresequenz, wobei bevorzugt eine oder mehr Eigenschaften aus der Gruppe bestehend aus den fol- 
genden Aminosaureeigenschaften verwendet wird: die Oberflache, das Volumen, der isoelektrische Punkt, der 
Seitenketten pKa, die Ladung, die Polaritat, die Hydrophobizitat, oder 

(iv) jegliche Kombination derselben. 

. Verfahren nach einem der Anspruche 1 bis 5, wobei sich die zweiten Substrate von den ersten Substraten insofern 
unterscheiden, als dass 1 bis 5 Aminosaurereste innerhalb der spezifischen Aminosauresequenz ausgetauscht 
werden. 

0. Verfahren nach einem der Anspruche 1 bis 9, wobei die zweiten Substrate funktionelle Gruppen tragen, welche die 
Detektion der Hydrolyse des Substrats ermoglichen, wobei besagte funktionelle Gruppen sind 

(i) ein oder mehr Fluorophore oder Chromophore, deren spektroskopische Eigenschaften sich bei Hydrolyse 
des Peptids andert, wodurch das Screening mittels Bestimmung der Anderungen der spektroskopischen Ei- 
genschaften erfolgt; oder 

(ii) zwei Fluorophore, die durch ihre Fluoreszenzeigenschaften unterscheidbar sind und an die gegenuberlie- 
genden Enden des zweiten Substrats eingebaut sind, wodurch das Screening mittels konfokaler Fluoreszenz- 
spektroskopie bei Fluorophorkonzentrationen unter 1 |xM erfolgt; oder 

(iii) zwei Fluorophore, die ein F1uoreszenz-Resonanz-Energietransfer(FRET)-Paar bilden, und welche an ge- 
genuberliegenden Enden des zweiten Substrats eingebaut sind, wodurch das Screening durch Bestimmung 
der Abnahme im Energietransfer zwischen den beiden Fluorophoren erfolgt; oder 

(iv) ein erstes und zweites autofluoreszierendes Protein, welche das zweite Substrat flankieren, wodurch das 
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Screening mittels konfokaler Fluoreszenzspektroskopie bei Substratkonzentrationen unter 1 jiM erfolgt; oder 
(v) ein Fluorophor und ein Quencher Molekul, welche an gegenuberliegenden Enden des zweiten Substrats 
eingebaut sind, wodurch das Screening mittels Bestimmung der Abnahme der Quenchung des Fluorophors 
erfolgt; oder 

5 (vi) ein Fluorophor oder ein Chromophor und ein Bindungsrest, welche an gegenuberliegenden Enden des 

zweiten Substrats eingebaut sind, wodurch das Screening durch Bestimmung der Bindung des Bindungsrests 
an einen spezifischen Bindungspartner erfolgt; oder 

(vii) ein radioaktives Label und ein Bindungsrest, welche an gegenuberliegenden Enden des zweiten Substrats 
eingebaut sind, wodurch das Screening mittels Anwendung eines Scintillations-Proximity-Assays erfolgt; oder 
10 (viii) jegliche Kombinationderselben. 

11. Verfahren nach einem der Anspruche 1 bis 10, wobei 

(i) die Population von Proteasen durch zufallsbestimmte Nukleinsaure Mutagenese, Kassetten-Mutagenese, 
15 Orts-Sattigungs-Mutagenese, ortsspezifische oder zufallsbestimmte Insertions- und/oder Deletionsmutagene- 

se, homologe /n-wYro-Rekombi nation, homologe /n-wVo-Rekombination, nichthomologe Rekombination oder 
einer Kombination derselben erhalten wird; und/oder 

(ii) die Expression der Population der Proteasen durch Wirtszellen, bevorzugt durch aus Bakterien, Hefen, 
Insekten, Viren oder Saugetieren stammenden Wirtszellen, oder durch ein zellfreies Protein-Expressionssystem 

20 erfolgt, und/oder 

(iii) die Kopplung von Protease-Genotyp und -Phanotyp durch den Einsatz von Probentragern erfolgt, die eine 
Kompartimentierung der Proben erlauben, und wobei die Verteilung von Genotypen in die Probentrager in einer 
Vielzahl pro Reaktionsraum erfolgt, die eine ausreichende Differenzierung von Phanotypen erlaubt. 

25 12. Verfahren nach einem der Anspruche 1 bis 1 1, wobei die erste Protease aus der Gruppe von Proteasen bestehend 
aus Serin-Proteasen, Cystein-Proteasen, Aspartat-Proteasen und Metalloproteasen ausgewahlt wird, und wobei 
die erste Protease bevorzugt aus der Gruppe von Proteasen, bestehend aus Papain, Bromelain, Trypsin, Pepsin, 
Chymotrypsin, Subtilisin, SET, humaner Elastase, Kathepsin, Chymase, Saccharomycopsis fibuligera PEP I, Kal- 
likrein, Urokinase, Thermolysin, Kollagenase, Pseudomonas aeruginosa Elastase, TEV-Protease, HIV-1 -Protease, 

30 BAR1 -Protease, FaktorXa, Thrombin, Tissue-type-PlasminogenAktivator, Kex2-Protease, TV MV- Protease, RSV- 

Protease, MuLV-Protease, MPMV-Protease, MMTV-Protease, BLV-Protease, EIAV-Protease, SIVmac-Protease 
ausgewahlt wird. 

1 3. Verfahren nach einem der Anspruche 1 bis 1 2, bevorzugt nach einem der Anspruche 5 bis 1 2, wobei die Zielprotease 
35 eine ahnliche Spezifitat wie Tissue-type-Plasminogen-Aktivator besitzt und das Zielsubstrat CPGR i WGG spaltet. 

14. Verfahren nach Anspruch 13, wobei die Startprotease die BAR1 Protease aus S. cerevisiae ist und bevorzugt 
folgende sekundaren/intermediaren Substrate verwendet werden: 

40 (i) WLGLVPGG 

(ii) WLGQVPGG 

(iii) WLGRVPGG 

(iv) WLGRWGG 

(v) CPGRWGG. 

45 

Revendications 

1 . Precede d' identification de proteases a sequences specifiques avec des specificites de substrat cible qui comprend 
so les etapes suivantes : 

(a) fournir une population de proteases composee de variants d'une premiere protease ou de variants ou 
chimeres de deux premieres proteases ou plus, lesdites premieres proteases possedant une specificite de 
substrat pour une sequence d'acides amines particuliere d'un premier substrat peptidique ; 
55 (b) mettre en contact ladite population de proteases avec un ou plusieurs deuxiemes substrats, comprenant 

au moins une sequence d'acides amines specifique etant identique au substrat peptidique cible ou possedant 
un caractere intermediate par rapport au premier substrat et au substrat cible mais n'etant pas presente a 
I'interieur du premier substrat peptidique ; et 
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(c) selectionner un ou plusieurs variants de proteases a partir de la population de proteases fournie a I'etape 
(a) possedant une specificite pour ladite sequence d'acides amines specifique des deuxiemes substrats fournis 
a I'etape (b) dans des conditions qui permettent Identification de proteases qui reconnaissent et hydrolysent 
de preference ladite sequence d'acides amines specifique a I'interieur des deuxiemes substrats, dans lequel 
5 le criblage pour une activite protease est realise en ajoutant un exces de peptides diff6rents du deuxieme 

peptide, pour utiliser ainsi les peptides ajoutes comme concurrents. 

2. Procede selon la revendication 1 , dans lequel les conditions de selection a I'etape (c) comprennent en outre 

10 (j) le criblage pour une activite protease a de faibles concentrations de substrat, pour augmenter ainsi I'affinite 

pour le deuxieme substrat, et/ou 

(ii) le criblage pour une activite protease en utilisant deux substrats ou plus en comparaison, pour augmenter 
ainsi la selectivity de I'enzyme. 

15 3. Procede selon la revendication 1 ou 2, dans lequel les etapes (a) a (c) sont repetees de facon cyclique jusqu'a ce 
qu'un ou plusieurs variants de proteases avec une specificite pour le deuxieme substrat soient identifies, et dans 
lequel les variants de proteases selectionnes lors d'un cycle sont utilises comme premieres proteases lors du cycle 
suivant, et dans lequel au moins un cycle et moins de 100 cycles sont effectues. 

20 4. Procede selon Tune quelconque des revendications 1 a 3, dans lequel un seul deuxieme substrat est utilise dans 
les un ou plusieurs cycles, et dans lequel le deuxieme substrat est identique au substrat cible. 

5. Procede selon Tune quelconque des revendications 1 a 3, dans lequel differents deuxiemes substrats sont utilises, 
et dans lequel les deuxiemes substrats possedent un caractere intermediaire par rapport au premier substrat et au 

25 substrat cible, et dans lequel le deuxieme substrat qui est utilise lors du dernier cycle est identique au substrat cible. 

6. Procede selon la revendication 5, dans lequel differents deuxiemes substrats sont utilises lors de cycles consecutifs, 
et dans lequel chaque deuxieme substrat possede un caractere intermediaire par rapport au deuxieme substrat 
utilise precedemment et au substrat cible. 

30 

7. Procede selon la revendication 5 ou 6, par lequel lors d'au moins un cycle les etapes (b) a (c) sont executees avec 
differents deuxiemes substrats en parallele, et dans lequel les variants de proteases isoles dans une telle voie 
parallele sont combines et utilises comme premieres proteases lors du cycle suivant. 

35 8. Procede selon I'une quelconque des revendications 5 a 7, dans lequel le caractere intermediaire des substrats 
intermediates est base sur 

(i) la composition en acides amines, 

(ii) la sequence d'acides amines, 

40 (iii) les proprietes physiques et/ou chimiques des residus d'acides amines a I'interieur de la sequence d'acides 

amines specifique, une ou plusieurs proprietes dans le groupe constitue par les proprietes d'acides amines 
suivantes etant de preference utilisees : la surface, le volume, le point isoelectrique, le pKa des chaines laterales, 
la charge, la polarite, I'hydrophobie, ou 
(iv) toute combinaison de celles-ci. 

45 

9. Procede selon I'une quelconque des revendications 1 a 5, dans lequel les deuxiemes substrats different des premiers 
substrats en ce que 1 a 5 des residus d'acides amines a I'interieur de la sequence d'acides amines specifique sont 
echanges. 

so 10. Procede selon I'une quelconque des revendications 1 a 9, dans lequel les deuxiemes substrats portent des groupes 
fonctionnels qui permettent la detection de I'hydrolyse du substrat, lesdits groupes fonctionnels etant 

(i) un ou plusieurs fluorophores ou chromophores, dont les proprietes spectroscopiques changent lors de I'hy- 
drolyse du peptide, d'ou il resulte que le criblage est effectue par determination du changement des proprietes 

55 spectroscopiques ; ou 

(ii) deux fluorophores qui sont differenciables par leurs proprietes de fluorescence et qui sont attaches a des 
extremites opposees du deuxieme substrat, d'ou il resulte que le criblage est effectue par spectroscopie de 
fluorescence confocale a des concentrations de fluorophores inferieures a 1 \jM ; ou 
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(iii) deux fluorophores qui torment une paire de transfert d'energie par resonance de fluorescence (FRET) et 
qui sont attaches a des extremites opposees du deuxieme substrat, d'ou il resulte que le criblage est effectue 
par determination de la reduction du transfert d'energie entre les deux fluorophores ; ou 

(iv) une premiere et une deuxieme proteines autofluorescentes flanquant le deuxieme substrat, d'ou il resulte 
5 que le criblage est effectue par spectroscopie de fluorescence confocale a des concentrations de substrat 

inferieures a 1 p.M ; ou 

(v) un fluorophore et une molecule de deactivation qui sont attaches a des extremites opposees du deuxieme 
substrat, d'ou il resulte que le criblage est effectue par determination de la reduction de la deactivation du 
fluorophore ; ou 

10 (vi) un fluorophore ou un chromophore et un groupe de liaison qui sont attaches a des extremites opposees du 

deuxieme substrat, d'ou il resulte que le criblage est effectue par determination de la liaison du groupe de liaison 
a un partenaire de liaison specifique ; ou 

(vii) un marqueur radioactif et un groupe de liaison qui sont attaches a des extremites opposees du deuxieme 
substrat, d'ou il resulte que le criblage est effectue par utilisation d'un dosage par scintillation de proximite ; ou 
15 (viii) toute combinaison de ceux-ci. 

11. Procede selon Tune quelconque des revendications 1 a 10, dans lequel 

(i) la population de proteases est obtenue par mutagenese d'acides nucleiques aleatoires, mutagenese par 
20 cassette, mutagenese par saturation de sites, mutagenese par insertion et/ou deletion dirigees ou aleatoires, 

recombinaison homologue in vivo, recombinaison non homologue, ou une combinaison de celles-ci ; et/ou 

(ii) I'expression de la population de proteases est assuree par I'utilisation de cellules notes, de preference 
d'origine bacterienne, de levure, d'insecte, d'origine virale ou de mammifere, ou est assuree par I'utilisation de 
systemes d'expression proteiques acellulaires ; et/ou 

25 (iii) le couplage du genotype et du phenotype des proteases est realise par I'utilisation de supports d'echantillons 

qui permettent le compartimentage des echantillons, et la distribution des genotypes dans les supports d'echan- 
tillons est realisee a une multiplicity par compartiment, ce qui permet une differentiation suffisante des pheno- 
types. 

30 12. Procede selon Tune quelconque des revendications 1 a 11, dans lequel la premiere protease est choisie dans le 
groupe de proteases constitue par les serine proteases, les cysteine proteases, les proteases aspartiques et les 
metalloproteases, et dans lequel la premiere protease est de preference choisie dans le groupe de proteases 
constitue par la papaine, la bromelaine, la trypsine, la pepsine, la chymotrypsine, la subtilisine, SET, I'elastase 
humaine, la cathepsine, la chymase, PEP I de Saccharomycopsis fibuligera, la kallicreine, I'urokinase, la thermo- 

35 lysine, la collagenase, I'elastase de Pseudomonas aeruginosa, la protease TEV, la protease du VIH-1, la protease 

BAR1 , le facteur Xa, la thrombine, I'activateur du plasminogene de type tissulaire, la protease Kex2, la protease du 
TVMV, la protease du RSV, la protease du MuLV, la protease du MPMV, la protease du MMTV, la protease du 
BLV, la protease de I'EIAV, la protease du SIVmac. 

40 1 3. Procede selon I'une quelconque des revendications 1 a 1 2, de preference selon Tune quelconque des revendications 
5 a 1 2, dans lequel la protease cible possede une specificite similaire a I'activateur du plasminogene de type tissulaire 
et dive le substrat cible CPGRiWGG. 

14. Procede selon la revendication 13, dans lequel la protease de depart est la protease BAR1 de S. cerevisiae et de 
45 preference les deuxiemes substrats/substrats intermediaires suivants sont utilises : 

(i) WLGLVPGG 

(ii) WLGQVPGG 

(iii) WLGRVPGG 
50 (iv) WLGRVVGG 

(v) CPGRWGG 
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